|UFDC Home||myUFDC Home | Help|
This item has the following downloads:
EFFICIENT OBJECT RECOGNITION
USING COLOR QUANTIZATION
SIGNE ANNE REDFIELD
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
Signe Anne Redfield
I thank my husband, Karl, for helping me to ignore distractions and keeping
me motivated to finish, and my son, Rowan, for providing distractions when I needed
them most. I would not have finished this without the exceptional help of my advisor,
Dr. John G. Harris. TM ,j: thanks go to Dr. Michael N. i lvba and Dr. A. Antonio
Arroyo, for the use of equipment and their constant support and encouragement, and
to Dr. Keith Doty, who got me interested in this project in the first place. Particular
thanks go to my family, who let me bounce ideas off of them until I finally figured
out what I needed to do, and especially to Mary Javorski, who gave me the kernel of
the idea that evolved into this dissertation.
This research was partially supported by a National Science Foundation grant
through the Minority Engineering Doctoral Initiative program, which among other
things supplied the computer I used to do the experiments and write the thesis and
travel money to attend conferences.
Many thanks also go to Publix Supermarkets, the Pepsi Bottling Company, and
the denizens of the Machine Intelligence and Computational NeuroEngineering Lab-
oratories for their donations of soda cans to my database.
TABLE OF CONTENTS
ACKNOWLEDGMENTS . . . . . iii
LIST OF FIGURES . . . . . . vii
LIST OF TABLES . . . . . . xi
ABSTRACT . . . . . . . xiii
1 INTRODUCTION . . . . . . 1
2 ROBOTIC OBJECT IDENTIFICATION . . . 8
2.1 Platform . . . . . . . . 9
2.2 Identification Methods ................... .... 10
2.3 Histogram Indexing ............................ 11
2.4 Color Constancy .................... ........ 13
2.4.1 R etinex . . . . . . . 15
2.4.2 Gamut Mapping .......... ............... 17
2.4.3 Other Methods ................... .... 20
2.5 Fixing Histogram Indexing ........... ............. 21
3 COLOR MEMORY . . . . . . 23
3.1 Experimental methods ............... ...... .. 24
3.2 Data analysis ............... .......... .. 31
3.2.1 Sensitivity ............... ......... .. 33
3.2.2 Calibration ............... ........ .. 37
3.3 Summary ............... ............. .. 39
4 QUANTIZATION . . . . . . 42
4.1 Overview of Algorithms .................. ..... .. 43
4.1.1 Uniform Quantization .................. ..... 43
4.1.2 Dithering .................. .......... .. 44
4.1.3 Modified Uniform Quantization ................ .. 45
4.1.4 Tree Quantization. .................. .... .. 46
4.1.5 Median-Cut Quantization ............. ... .. .. 48
4.1.6 Vector Quantization .................. .. 48
4.2 Number of Colors .................. .......... .. 49
4.2.1 Optimizing Accuracy .................. .. 50
4.2.2 Lighting Shifts .................. ....... .. 51
4.3 Summary .................. .............. .. 58
5 THEORY AND RESULTS ON SYNTHETIC DATA.. . 61
5.1 Theory .................. ................ 61
5.1.1 Structure and Methods .................. ..... 61
5.1.2 Theoretical Results .................. ... .. 62
5.1.3 Larger databases .................. ..... .. 65
5.2 Synthetic Data .................. ........... .. 68
5.2.1 Single Hue Region .................. .... .. 69
5.2.2 All Hues .................. ........... .. 72
5.2.3 Multiple Hues ............ . . .... 73
5.2.4 More Complex Lighting Shifts ................. .. 77
5.3 Over-fitting . . . . . . .. . ... 80
5.4 Conclusions . . . . . .... . ... 81
6 STILL IMAGE RESULTS . . . . . 83
6.1 Theoretical Comparison .................. ..... .. 84
6.2 Lighting Compensation .................. ....... .. 91
6.2.1 Workplace Lighting .................. .... .. 91
6.2.2 Household Lighting .................. .... .. 95
6.2.3 Full Database .................. ........ .. 97
6.3 Localized Hue Data .................. ......... .. 99
6.4 Orientation vs. Lighting C(i I,.;.. ................ .... 100
6.5 C('!i i. i Ii city Results . . . . . .. 104
6.6 Retinex .................. ............... .. 106
6.7 Summ ary .................. ............ .. .. 112
7 ROBOT PROTOTYPES . . . . . 115
7.1 Off-Line Implementation .................. ..... .. 116
7.2 Real-Time Implementation .................. .... 118
7.2.1 First Implementation ................ .... .. 118
7.2.2 Results on First Implementation . . ..... 118
7.2.3 Second Implementation ................. . 120
7.2.4 Fixed Lighting .................. .. ..... 122
8 CONCLUSION . . . . . . 125
A COLOR SPACES AND SENSORY SYSTEiMS . 127
A.1 Human Vision .................. .......... 127
A.1.1 Color Constancy .................. .... 129
A.1.2 Color M emory .................. ........ 132
A.2 Color Spaces .................. ............. 133
A.3 Robot Sensory Systems .................. .. ..... 137
B DATABASES AND IMAGES . . . . . 140
REFERENCES . . . . . . . 144
BIOGRAPHICAL SKETCH . . . . . 150
LIST OF FIGURES
3.1 Interface window for program enabling user to choose focal colors. Pri-
mary window shown here allows user to choose colors; secondary win-
dow shows focal colors after user has moved on to next color .....
3.2 Interface for user to choose boundaries between focal colors. For each
of seven levels of brightness, the user places lines where boundaries are
perceived. Focal colors are assigned to the regions between the lines
by the user . . . . . . . .
3.3 Screen shots of p-' i-1 |1 ',!-i. i1 testing interface. (a) original color; (b)
intersample noise; (c) test color; (d) simultaneous comparison.....
3.4 Focal colors and initially di-! '.1-, ,I colors at LSB 5 .. .......
3.5 Sample receiver operator characteristics curve .. ..........
3.6 Monitor Calibration Results .. ...................
3.7 Results of p(A) analysis with actual change derived from monitor cal-
ibration . . . . . . . . .
4.1 Original image used to demonstrate results of different quantization
schem es . . . . . . . . .
4.2 Results of uniform quantization .. .................
4.3 Results of dithering on uniform quantization .. ...........
4.4 Results of modified uniform quantization .. .............
4.5 Diagram for tree quantization . .................
4.6 Results of median-cut quantization .. ...............
4.7 Sample color changes under a single di-'s sunlight .. .......
4.8 Average and standard deviation for hue under varying lighting. The x
location of each bar indicates the average of the average values for the
(d -1. ... . . . . . . . . ...
4.9 Mean values for cans and calibrator hues .. .............
4.10 Probability density functions for chromatic can colors along the hue axis.
4.11 Probability density functions for all can colors along the saturation axis. 56
4.12 Probability density functions for all can colors along the value axis. 57
4.13 Probability density functions for the colors in the red category. ... 58
4.14 Sample image quantized using the 8-color lighting-shift based classifier. 59
5.1 Diagram showing examples of each variable used in the theoretical
equations. .................. .............. .63
5.2 Theoretical results for c = 224, p = 2 and p = 3, n = 3, and k varying. 65
5.3 Theoretical results for c = 224, p = 3, and k varying along the x axis.
Each line corresponds to a different value of n. ............ ..66
5.4 Theoretical results for c = 224, n 4, and k varying along the x axis.
Each line corresponds to a different value of p. . . ..... 67
5.5 Simulation (averaged) results for c = 224, p = 2 and p = 3, k = 10,
and n varying. .................. ............ ..68
5.6 Image of single hue region synthetic database. Red corresponds to
higher values; blue to lower values. ................. .69
5.7 Results on single hue region database for varying numbers of bins,
using uniform (blue) and accuracy sweep (red) quantization. Average
accuracy across the different shifts is shown in the lower right hand
plot. Standard deviation of average value is shown with dotted lines.
Progressively larger shifts from none (upper left hand plot) to 7 (middle
lower plot) are shown in the remaining plots. . . ...... 70
5.8 Database of objects completely spanning the hue axis . .... 71
5.9 Results of uniform (blue) and accuracy sweep (red) methods. . 72
5.10 Original database for histograms with two colors. . . 73
5.11 Average accuracy of uniform (blue) and accuracy sweep (red) methods
as a function of shift. ............... ..... 74
5.12 Results of uniform (blue) and accuracy sweep (red) methods. ...... 75
5.13 Second two-peak database. ............... .... 76
5.14 Average accuracy for second two-peak database as a function of shift. 77
5.15 Results of uniform and accuracy sweep methods on second two-peak
database. ............... ........... .. .. 78
5.16 Tranformation from average illuminant to early morning and evening
illum inants. . . . . . .. . 79
5.17 Comparison of original and warped databases. ........... ..80
5.18 Results for training (original) and testing (warped). . .... 81
5.19 Over-fitting of accuracy sweep method. ............... 82
6.1 Sample soda images used in 14-can database. Soda shown here is
Publix brand Diet Cola, under each of eight different illuminants. 84
6.2 Colormaps for 2, 3, 4, 5, 6, 8 and 14 colors. ............. 85
6.3 Theoretical predictions (blue solid lines) for c = 224, p = 2 and p = 3,
n = 3, and k varying. Real database results (red dashed lines) for
c = 224, p 3 and p 5, n 3 averaged over 20 sets, and k varying 86
6.4 Real database results for n varying, with k = 8 and k = 14, p = 2
and p = 3. The blue solid lines show k = 14 and the red dashed lines
show k = 8. The blue and red lines with triangle markers show the
results from the comparison between theoretical data (blue) and real
data (red). .................. .......... 87
6.5 Real and theoretical results for k = 1 to k = 25. Dotted line shows
theoretical results for p = 2. Dashed line shows theoretical results
for p = 3. Crosses show averaged real results for p = 2. Stars show
averaged real results for p = 3. ............. . .90
6.6 Uniform quantization results for the workplace database. ...... ..93
6.7 Example of the characteristic knee in the accuracy curve for real data. 94
6.8 Comparison of uniform and accuracy sweep methods on the workplace
database. ... ............... .. ...... ... ..95
6.9 Full database accuracy vs. number of bins for uniform quantization. 98
6.10 Localized database accuracy vs. number of bins for uniform quantization. 99
6.11 Comparison of accuracy sweep (red) and uniform (blue) quantization
methods. . .............. ............. ..101
6.12 Comparison of different numbers of orientations with respect to accuracy102
6.13 Comparison of lighting condition change to orientation change ..... 103
6.14 Images and 2-D histograms using [r, g] chromaticities. . ... 104
6.15 Comparison of uniform quantization methods. ........... ..106
6.16 Histogram results for the full database, without color constancy. 108
6.17 Histogram results for the full database, with color constancy. .... 109
6.18 Comparison between uniform quantization results on full database for
data with and without retinex pre-processing. . . 110
6.19 Comparison between results with different color constancy methods. 111
7.1 Sample image of input to system. Region used to create histogram is
outlined in red; remainder of image is shadowed. . . .. 116
7.2 Sample histograms from database. ................ . 117
7.3 Comparison of 7-Up and Mountain Dew nutritional information 124
A.1 RGB color space. ............... ...... 134
A.2 Munsell color space .............. . .. 135
A.3 Munsell colors used by Berlin and Kay, with focal colors (dots) and
region boundaries. .................. .... ..... 136
B.1 Sample image used to generate full database of 86 cans, with 8 different
orientations and 4 different illuminants. ............... 142
B.2 Samples of images used to generate full database of over 82 cans, with
8 different orientations and 4 different illuminants. Top row is not
processed; bottom row is processed with the retinex algorithm. . 143
LIST OF TABLES
3.1 Number of trials given bit-depth ................ ... 33
3.2 Number of trials given bit-depth ................ ... 33
3.3 Results of the d' and p(A) analyses. ................. 35
3.4 Gamma Calibration Data .................. .... .. 37
6.1 Real data results for p = 3. CC shows the expected results if the algo-
rithm is performing correct color constancy and accurately identifying
each object. .................. ............. .. 89
6.2 Object recognition accuracy generated from database of 9 soda cans
under 4 different lighting conditions. Ill. in the table refers to illumi-
nant, and indicates the images that were used as the templates in the
database, while the test data consisted of the remaining 27 images. 92
6.3 Accuracy on household database ............ .. .. .. 96
6.4 Accuracy on full database for lighting shift. . . ...... 97
6.5 Accuracy on full database with orientation varied and lighting constant. 100
6.6 Accuracy on full database for lighting shift quantization methods. .. 107
6.7 Accuracy on full database with retinex pre-processing. . ... 112
7.1 Number correct when identification of sodas is tested using orientation
midway between those in the database. Eight cans with two orienta-
tions in database, fourteen colors. The numbers 1 and 2 correspond to
orientations; Err indicates what was chosen instead of the correct soda.
Codes are: Sch = Schweppes CT =Country-Time Lemonade ,
EO = Eckerds Orange Drink WG Welch's Grape DrinkTM, C
TM TM TM
SCoca-Cola D7 Diet 7-Up CD Canada Dry and 7U =
7-Up ......... ...................................119
7.2 Number correct when identification of sodas is tested using orientation
midway between those in the database. Eight cans with four orien-
tations in the database, eleven colors. The numbers 1, 2, 3, and 4
correspond to orientations; Err indicates what was chosen instead of
the correct soda. Codes are: Sch = Schweppes CT = Country-Time
Lemonade EO Eckerds Orange Drink WG =Welch's Grape
TM TM TM TM
Drink C Coca-Cola D7 Diet 7-Up CD Canada DryT
and 7U = 7-UpTM 120
7.3 Number correct when identification of sodas is tested using orientation
midway between those in the database. Ten cans with four orientations
in the database, eleven colors. The numbers 1, 2, 3, and 4 correspond to
orientations; Err indicates what was chosen instead of the correct soda.
Codes are: Sch= Schweppes CT Country-Time Lemonade EO
= Eckerds Orange Drink WG = Welch's Grape DrinkTM, C = Coca-
TM TM TM .TM
Cola CD Canada Dry MD = Mountain Dew P = Pepsi,
SL = Slice and LDL = Lipton Diet Lemon Brisk Iced Tea ... 121
7.4 Number correct when identification of sodas is tested using orientation
midway between those in the database. Eleven cans with four orien-
tations in the database, sixteen colors. The numbers 1, 2, 3, and 4
correspond to orientations; Err indicates what was chosen instead of
the correct soda. Codes are: Sch = Schweppes CT = Country-Time
Lemonade EO = Eckerds Orange Drink WG = Welch's Grape
Drink C = Coca-Cola P PepsiT, SL SliceTM, LDL = Lip-
ton Diet Lemon Brisk Iced Tea 7U = 7-Up WC = Wild ('C! y
Pepsi and DP Diet Pepsi TM. . 122
7.5 Number correct when identification of sodas is tested using orientation
midway between those in the database. Twelve cans with four orien-
tations in the database, sixteen colors. The numbers 1, 2, 3, and 4
correspond to orientations; Err indicates what was chosen instead of
the correct soda. Codes are: Sch = Schweppes CT = Country-Time
Lemonade MMO Minute Maid Orange Soda WG = Welch's
M T.M TM
Grape Drink C Coca-Cola P PepsiT, SL -Slice LDL
SLipton Diet Lemon Brisk Iced Tea 7U 7-Up WC = Wild
C(!, i y Pepsi DP Diet Pepsi and SP SpriteT ....... 123
B.1 Sodas in final database. ................... .... 141
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
EFFICIENT OBJECT RECOGNITION
USING COLOR QUANTIZATION
Signe Anne Redfield
(C i i i, il : J. Harris
Major Department: Electrical and Computer Engineering
A simplification of the color histogram indexing algorithm is proposed and an-
alyzed. Instead of taking a histogram consisting of hundreds of colors, each input
image is first quantized to only a few colors (between eight and sixteen) and the fea-
ture vector is generated by taking a histogram of this smaller space. This increases
the efficiency of the system by orders of magnitude. We also proposed that this would
reduce the effects of lighting change on the algorithm and that this would be a bet-
ter model for the human object recognition mechanism than the algorithm combined
with color constancy alone.
In support of the contention that this may be a better human model, a psy-
chophysical experiment was conducted. The bit-depth of human color memory was
shown to lie between 3 and 4 bits, corresponding to 8 to 16 color categories when a
color is remembered for five seconds. This experiment created a bridge between the
worlds of the p-i-l k i-, -i. I1 results and the computer tested.
The research showed that quantization can occasionally compensate for small
lighting changes, but that the compensation is highly database-dependent and erratic.
However, quantization alv--,v- produced a much more efficient system and generally
did not substantially reduce the accuracy.
The results of this work were threefold. First, human color memory is relatively
poor, indicating that a system incorporating quantization will be far closer to mimick-
ing human abilities than a system without it. Second, quantization alone is insufficient
to perform color constancy in most cases. Third, with or without a color constant
pre-processor, our results consistently showed that quantization has little effect on
accuracy when using more than sixteen bins. Object recognition accuracy degrades
substantially as the number of color categories drops below six. From 10 categories to
256, accuracy is essentially unchanged. Quantization is a very efficient way to reduce
the computational complexity and storage requirements of this algorithm without
substantially affecting its object recognition accuracy.
The purpose of this research is to explore the effect of quantization on object
recognition accuracy when color is the only cue. There are many v--,v in which a
system can identify an object. These sensory systems range from very simple (such
as distinguishing between obstacle and air via infra-red return) to extremely complex
(such as using multi-sensory input and sensor fusion to make sharp distinctions be-
tween specific objects). Object recognition using color images lies somewhere in the
middle of this range. Generally the system is being required to make fairly sophis-
ticated choices (track the red object but ignore the other red object), with limited
resources. There are many other cues that can and often should be used, even when
vision is stipulated as the primary sense, but here we will be exploring the effects of
quantization when only color is considered.
Object recognition using images is generally inefficient. The vast quantities of
input data tend to overwhelm systems attempting to extract only useful information.
Systems that recognize objects based on shape alone require moderate databases
and impose a substantial computational burden. Systems that rely on color have to
deal with triple the input information required for shape-based systems, and have
difficulty when the lighting changes. However, under controlled conditions, or when
the lighting changes are somewhat predictable, efficient object recognition using color
is possible. Furthermore, if a truly efficient object recognition algorithm using color
can be derived, more expensive techniques can be used on only the subset of data
produced by the color recognition process. This could dramatically speed up the
recognition process, even when multiple object properties are used as features.
Objects can be identified using many possible features. Here we look exclusively
at color, used as the only feature. Obviously, not all databases are suited for this
approach. Soda cans, however, form a very good data set for this experiment. They
are uniform in shape and thus easily segmented from their background, and they come
in a wide variety of colors. In addition, there are many potential uses for the system,
such as a butler robot. If you send your robot to get you an ice cold Coca-Cola ,
the robot should be able to identify it reliably and not bring you FrescaM instead.
Our research shows that quantization can make an appropriate object recognition
algorithm much more efficient.
Vision is a passive sense. Assuming we are using a robot to identify objects, the
robot does not need to disturb what it is looking at to identify the object. The robot
does not even have to be close to the object to be capable of identifying it. In addition,
vision is not object specific. The same sensor can be used to identify many different
types of objects. For example, humans can tell simply by looking which object is a
chair and which is a desk. They can even differentiate between the blue chair and the
red chair. This makes vision one of the most potentially flexible sensory modalities.
Furthermore, vision enables humans and, potentially, robots to precisely determine
the locations of relatively distant objects. Other senses are not as useful. Sonar, for
example, would allow a robot to approximately determine distant objects and closely
determine nearby objects. Infra-red reflections allow a robot to only approximately
determine nearby objects, and require the use of additional sensors to further define
its environment. In addition, in order to use such senses effectively, and eliminate
possible confusion, the placement of these sensors is critical. If they are poorly placed,
the robot will be unable to distinguish between critical situations. Furthermore, r, ,ni
sensors are needed in order for these alternative sensory modalities to operate well.
With vision, however, a single camera can be placed anywhere with a good view and
will provide far more information. Vision is in many v- -, an ideal robotic sensor.
However, vision is also the source of many problems. Vision-based object recogni-
tion is plagued by the computationally intensive nature of image data. Color images
in particular require vast storage capabilities. An image of 128 by 128 pixels, with
8 bits or 256 levels per color band requires 393,216 bits just to store all the pixel
values. Almost all the algorithms using visual data also require substantial process-
ing to extract the relevant information. Image data contains remarkable quantities of
information, but extracting that information can be very time-consuming and compli-
cated. The same characteristic that makes vision such a wonderful sensory modality
also makes it very difficult to work with.
It is, of course, possible to obtain tremendous amounts of information about the
world using only grayscale images. Why increase the robot's computational burden by
using color? If we assume that all the edge-detection and shape-based techniques we
would use for a grayscale image are already available, the addition of color allows us
to make the task of separating objects from their environment much easier. Suddenly
we can distinguish between an apple and an orange without complicated texture
an i ,-i- We can even easily distinguish the ripe oranges from the unripe oranges.
Unfortunately, the computational complexity of the system will be greatly in-
creased. How can we use the wealth of information inherent in this sense without
overwhelming our system? Given the computational burden of shape based algo-
rithms, and the relatively smooth variation of chromaticity from pixel to pixel, per-
haps a statistical method working only with chromaticity would be an appropriate
Swain and Ballard  proposed the histogram indexing algorithm in 1991. Fun-
damentally, the algorithm simply looks at the statistical probability of the occurence
of each color for a given object, and uses that information to determine the most likely
match. ('!i ipter 2 explores the original algorithm in detail. It is sufficient to -iv- here
that the algorithm has almost all the properties required for robust object recognition
in a realistic environment. It is robust to orientation changes, scaling, rotation and
deformation. It is unfortunately fragile when exposed to lighting changes and large
databases. Using color as the only feature dramatically reduces both storage and
Several sil. --Ii. i. i- of v--,y to make the algorithm more lighting invariant have
been made, and are explored in more detail in C'! Ilpter 2. The authors s.-.L-- -. I1 the
use of a color constant preprocessor before the algorithm is run. The field of color
constancy includes all the algorithms that attempt to convert an image taken under
one light to an image taken under a different, default light. There have been r, ir:y'
approaches to this problem. Of these, Land's retinex theory [36, 37, 38, 39, 40, 30, 51,
10, 33, 49, 31, 67, 42] is one of the most widely researched and is relatively simple to
implement. Alternative methods include Forsyth's gamut mapping algorithm [23, 21]
and global methods such as subtracting the dominant color from the entire image, or
normalizing each pixel by its luminance.
Instead of color constancy methods, some researchers opt for a local method uti-
lizing shape information . Instead of taking histograms of the colors themselves,
this method incorporates color constancy into the algorithm by taking histograms of
color ratios. However, this simply adds a step to our research, as we would still be left
with the problem of quantizing the color ratios. All methods that incorporate color
constancy into the algorithm are more computationally complex than the original.
In addition, many simply are not good enough for object recognition. Funt et al.'s
paper  showed fairly conclusively that of the simple versions of the algorithms,
Forsyth's and Land's algorithms work almost equally well, obtaining roughly 70'.
accuracy on their database. Their database is unrealistic, containing images of single
objects under dramatically colored single illuminants with a white patch available for
calibration purposes. The authors concluded that their results meant that current
color constancy algorithms were insufficient to reliably recognize objects in the real
For our purposes, the algorithms may or may not be sufficient, but they are
unquestionably too complicated and require far too many resources. In order for our
robot to function efficiently, it should not have to spend five minutes in front of the
fridge with the door open compensating for current lighting conditions.
Histogram indexing was biologically inspired. Psi. 1- .1 ,!i -- l experiments showed
that this was a plausible approximation to one of the methods humans use to identify
objects. Color constancy is also biologically inspired. The retinex theory was devised
as a model for human color constancy. However, algorithms that approximate human
color constancy are not perfect. In fact, although the general perception may be that
humans are quite good at compensating for the illuminant, research has shown that
humans are decidedly imperfect [19, 69, 1]. Consensus in the biological literature is
that color constancy is instantaneous. Humans see a scene, and they are instantly
capable of determining the approximate hue of a given object without respect to
lighting. Background information on color constancy and other p-i-, !.1 .1, !i-i. phe-
nomena is discussed in C'! lpter 2 and Appendix A.
The object recognition task, in and of itself, must incorporate a delay of some
sort. In order to recognize an object, one must have seen it before; thus the element
of time is implicit in the task. What happens to colors when they are stored in human
memory? C'! Ilpter 3 explores this issue. Human sensitivity to shifts in hue with a
five second delay corresponds to between 8 and 16 color categories.
We implement the degradation of color memory in a robotic system with quan-
tization. For the many possible colors that our camera can capture, we sort them
into categories via quantization and then use those categories to generate our his-
tograms. C!i lpter 4 explores the background of quantization, and specific methods,
in detail. Swain and Ballard showed that under strictly controlled lighting conditions
objects could be reliably recognized using color histograms. They used 512 colors to
prove their point, but showed little degradation for as few as 64 colors. However,
our previous experiments [28, 53, 52, 54] have shown that quantization to between 8
and 16 colors can produce improved recognition accuracy under varying lighting con-
ditions, and rarely decreases accuracy. All 1,i-;i of lighting shifts showed that when
fluorescent light and d light are the primary sources, 6 chromatic categories and two
achromatic categories are sufficient for our databases.
Our results on synthetic data are presented in ('!i lpter 5. This chapter also in-
cludes theoretical and simulation results, showing that quantization is an effective way
of making the algorithm more efficient and is a potential replacement for more tradi-
tional color constancy algorithms. Chapter 6 contains the results when the database
is made up of still images of real objects. These databases range in size from 9 ob-
jects to over 80 objects, generally with at least 4 common lighting conditions. These
results show that using quantization as a substitute for color constancy algorithms
is impractical. However, quantization produces a far more efficient system, with or
without color constancy algorithms. ('! Ilpter 7 describes several prototype systems
using 10, 11, 14, and 16 colors with varying results. These systems show that for small
databases and minimally varying lighting, good lighting invariance can be obtained.
For more varied lighting, additional data under representative lighting conditions will
allow the algorithm to perform well. When color constancy algorithms are used with
quantization, the system is combines the increased accuracy of the color constancy
algorithms and the increased efficiency of the quantization.
C'!I ipter 8 summarizes the results of this research. In general, human color memory
is capable of distinguishing between members of 8 to 16 categories. The results of
experiments on still images and synthetic databases show that there is little to no
degradation in accuracy as the number of quantization categories is decreased to
between 8 and 16. Accuracy tends to decrease as the number of categories falls
below 8. In general, our results indicate that quantization is a very effective means
of increasing the efficiency of a system without decreasing its accuracy. In some
cases, quantization may increase the accuracy as the number of bins is decreased to
between 8 and 16. Our prototypes show that near perfect accuracy is obtainable in
real situations, with minimally varying lighting using only 11 color categories.
ROBOTIC OBJECT IDENTIFICATION
Our purpose is to demonstrate the effect of quantization on color object recog-
nition. Unfortunately, almost all color constancy algorithms published are deemed
;ood" by the authors, with no indication of a metric with which to judge them other
than the human cv- If we are to determine quantization can have the side effect of
color constancy, we need a more independent, reproducible metric. The accuracy of
color object recognition provides such a metric.
Let us assume we have a butler robot whose function is to deliver soda cans.
We would like our robot to identify these cans in a realistic laboratory environment.
Many experiments have tested algorithms in very austere environments , with fixed
environmental variables such as only allowing the robot to move within a rigidly con-
trolled area such as a hallway . Generally, the constraints on these environments
are justified by assuming the algorithm will be implemented in an equally constrained
environment, such as a factory floor. However, these algorithms generally break down
in more realistic, uncontrolled environments . Some of the earliest autonomous
robots responded to stimuli in simple v--,- but were capable of exhibiting behaviors
that we, as humans, associate with emotion . These robots had very limited sen-
sors, but were very robust. Vision, as the most complex sense available to robots, is
both extremely data-intensive and extremely fragile in the way it is usually used for
In the laboratory where our robot must work, the lighting varies and the overall
environment changes often. Furniture is moved, creating new shadows and reflections.
The arrangement of drinks within the refrigerator changes. This is not a rigidly
controlled environment. In fact, it contains many of the characteristics of the real
world: it is minimally constrained, chaotic and unpredictable. This dissertation is not
concerned with the physical design of our robot, or with its manipulators, or with how
our robot finds the refrigerator, the offices, or even the cans within the refrigerator.
We could even use a simpler method to determine which can is which, such as reading
the bar codes on the sides of the cans. One purpose of this research is to determine
the effect of quantization on color object recognition with a view to determining its
effectiveness as a substitute for more elaborate color constancy algorithms, and we
use our robot's identification of the correct can as the metric with which to compare
Quantization has several advantages over color constancy. For example, it is
more useful in the case of an actual robotic implementation. In order for our robot
to function, the object recognition algorithm must satisfy certain constraints. First,
there is the issue of price. In order to keep the cost of the robot to a minimum, we want
an algorithm that will minimize use of the processor, minimize the necessary memory,
and not require expensive sensory equipment. In theory, a better sensor implies better
results. In actuality, given finite resources, a more expensive sensor will affect the
rest of the robot, possibly resulting in a cheaper processor, less memory, and perhaps
even compromises in the design of the robot's manipulators and its maneuverability.
Second, our robot should work in real time. In order to identify the can, the robot
will have to keep the door of the refrigerator open. The faster the robot can decide,
the sooner the refrigerator door will close and the less electricity will be wasted. In
addition, the faster the robot knows which is the correct can, the faster the can will
These two constraints (price and speed) define the degree of acceptable computa-
tional complexity. Price determines how fast a processor we can use, with how much
memory. Price also determines the quality of the input data, by determining the
capabilities of the camera and frame grabber. The speed constraint combined with
the processor and memory constraints determine the complexity of the final object
recognition algorithm. The final criterion is that the system must be robust to as
many changes as possible in the object to be identified and in the environment. Price
must be minimized, while speed and robustness must be maximized. Quantization
will make the system more efficient, both by reducing the number of elements that
need to be stored in the database and by reducing the number of computational
operations that need to be executed in order to identify an object.
2.2 Identification Methods
There are many possible solutions to the soda identification problem. Instead of
actually asking our robot to identify the cans, we could standardize the organization
of the refrigerator. SpriteTM would ahv--l be in the same place, as would every other
soda. Unfortunately, this method is unlikely to work in the chaotic environment
assumed above. There is no guarantee that the person responsible for filling the
refrigerator will choose to buy the same drinks every time, or put them in the same
place. Furthermore, the contents of the refrigerator may be shifted to allow room for
items other than drinks.
We could try to simplify the system by using grayscale images instead of color
images. However, in order to recognize cans with -i ,v-' 1. images, we would need
to use shape-based methods. Cans tend to have similar distributions of curves and
angles, and every can has one side dedicated to nutritional information. The nutri-
tional information has approximately the same shape distribution for every can. If
we wanted to identify cans based on the nutritional information we would have to be
able to extract the words for the ingredient list, and match them up to a database
of words to determine the ingredients, which would then be compared to a known
database of objects. The nutritional content of different sodas is remarkably simi-
lar. If we want to identify the cans based on the logos of the different drinks, we
would need some form of shape-matching, and possibly character recognition [8, 2].
Fundamentally, shape-based methods tend to be both computationally intensive and
time-consuming, violating our cost criteria.
Instead, we implement Swain and Ballard's histogram indexing method . This
method, inspired by human p- i- 1 ,!'i--~i. l experiments, meets our criteria in almost
every way. It is very efficient. Instead of performing complicated shape-based cal-
culations to extract information, the computer generates a histogram of the colors
present in an image of the desired object. This histogram is normalized and used as
the feature vector to describe the object in the database. When an unknown object is
presented, its histogram is generated and compared to the histograms in the database.
Simple Euclidean distance is used to compare the histograms, and a nearest neighbor
classifier  is used to identify the objects. The algorithm satisfies the cost criteria
above. It is also robust to orientation changes, scaling, and rotation and deformation
of the object. The only area in which it fails is robustness to the environment when
the lighting changes.
Because of this failure, the algorithm is a good choice for testing the effectiveness of
color constancy methods. The histogram indexing algorithm's accuracy will become
close to perfect when the color constancy method is compensating adequately for the
lighting conditions. As the input becomes less and less color constant, the accuracy
2.3 Histogram Indexing
The color histogram indexing algorithm was originally proposed as an instance of
animate vision for robotic systems. The two requirements it must satisfy are that
it must work in real time and that it must di p!iv environmental robustness. The
stated goals of the researchers were, first, to determine the identity of an object with
a known location and second, to determine the location of a known object.
To solve the first problem, that of determining the identity of an object, the
authors define a similarity metric called histogram intersection, which compares the
number of pixels of each color in the new image to the number of pixels of that color in
the model in the database. The model in the database is assumed free of background
(non-object) pixels, while segmentation of real data is assumed incomplete, or at best
Histogram intersection is robust to distractions in the background of the object,
to viewing the object from a variety of viewpoints, to occlusion, and to varying image
resolution. It is not robust to varying lighting conditions or to large database sizes.
Swain and Ballard's definition of the intersection of two histograms is
Smin(Ij, Mj) (2.1)
where I and M are histograms with n bins each. They obtained a match value from
zero to one by normalizing this intersection by the number of pixels in the reference
histogram, and scaling the image of the unknown object to the same number of pixels.
For efficient indexing into a large database, they calculated a match using only
some number of the largest bins. Their database contained 66 images of real objects
that were used as the training set, and a testing set of images including occluded
and rotated objects. They obtained 10to'. accuracy keeping the 10 largest bins, and
99.>.' accuracy keeping 200 bins. They simulated changing lighting conditions, but
only linear intensity changes, not chromatic or non-linear changes.
To solve the second problem, that of finding the location of a known object, they
define a ratio histogram R as
Ri min -, 1 (2.2)
The values of the histogram R replace the image values and the result is convolved
with a mask of the appropriate shape and size for the object in question. If the target
is in the image, a peak in the convolved image indicates its most likely location.
Results were good, with 5 occluded objects out of 32 corresponding to the 2nd highest
peak, 1 occluded object corresponding to the 7th highest peak, and the rest identified
The histogram indexing algorithm is extremely robust. It retains its accuracy
for many changes in the object, such as rotation, occlusion, scaling and deforma-
tion. It is, however, extremely sensitive to lighting changes. If the lighting changes,
then many pixels will change sufficiently to move from one histogram bin to another,
profoundly altering the overall histogram. To solve this problem, the authors sug-
gested a simple color constancy preprocessor. Many approaches to the solution of this
problem have been attempted. Color constancy based systems have shown limited
success, while shape based systems have shown slightly more. All these solutions
require dramatically more computations and memory than the original algorithm.
2.4 Color Constancy
The general color constancy problem has been studied extensively [1, 4, 9, 19, 23,
29, 39, 40, 42, 58, 69, 47]. Two basic assumptions underlie most of these solutions.
First, people do it well. This is addressed in more detail in Appendix A, but is
open to debate, depending on your criteria for I. !! People are capable of color
constancy to some degree, but actual color-to-color matching exercises show that
people can perform well only if the color is perceived as an aspect of a concrete
object . Simply correcting for a given lighting condition without this association
seems to be very difficult. The second common assumption is that color constancy
is an instantaneous phenomenon, and therefore the effect of color memory on color
constancy is negligible in this context. This may be true. Many color constancy
experiments involving the same object rely on first one exemplar, and then a d. 1li-,
and then the exemplar under a different light. This was, in fact, one of the first
examples of color constancy in Land's and McCann's work [36, 37, 40]. However,
this d, 1 i '1- experiment inherently involves color memory, as color memory has been
shown to degrade over d,-1 i- on the order of 200 ms . Even within a scene, as we
look from one object to another under the same light, it is not ah--iv-x obvious that two
colors are different. It is often necessary to place the objects side by side to enable an
observer to see the substantial difference between their colors. Researchers attempt to
get around this by having two known identical scenes visible to the participant, and
asking the participants to adjust the lighting on one scene so it matches the other.
There are two types of solutions to the color constancy problem. Global solutions
make up the first type. Generally, color constancy solutions rely on the stricture that
lighting changes have relatively low spatial frequency compared to surface property
changes. The most extreme manifestation of this, therefore, is to remove the DC color
component of a given image. Many global algorithms exist along these lines. Some
methods attempt to normalize the color values, to eliminate dependence on intensity
or some other characteristic. Fir1 ,i- -,  has developed a color constant space
that effectively eliminates the illuminant. Unfortunately, in order to determine the
required projection that will take you into this space, you must first have examples
of the changes in lighting you wish to compensate for.
Local algorithms make up the second type of solutions to the color constancy
problem. These include Land's retinex theory of color constancy, first published in
1959 [36, 37]. The algorithm uses a very large local region (almost global) comparing
the current pixel value to the overall color surrounding it, with a steep discount as a
function of distance. In 1971 , he and McCann published additional results using
the retinex as a model for human color vision. In the mid-eighties, Land published
papers containing new versions of this theory [38, 39], which was revised and expanded
by other researchers throughout the eighties and nineties [9, 10, 4, 33, 49, 31, 42].
Most color constancy algorithms are based on what is known as von Kries' prin-
ciple. Coefficients independently adjust the gain of each photoreceptor (or channel)
to obtain surface color descriptors. These factors vary according to the author cited:
for example, Brill and West  use factors of one over the output of that photore-
ceptor for a known white patch. In addition to failing when the photoreceptor classes
are not independent of each other, this algorithm also illustrates a common failing
among color constancy algorithms. In general, color constancy algorithms require
either known surfaces (either known whites or known types of surfaces, such as Mon-
drians: flat, matte images of smooth color regions) or known lighting conditions, or
In 1959, Edwin Land [36, 37] proposed the first biologically-based color constancy
algorithm. In the mid-eighties, with the advent of computers fast enough to simulate
the process, he revisited his previous work [38, 39]. These new algorithms explored
possible methods to improve the biological accuracy (incorporation of Mach band
effects, for example) and the simplicity of the computation. In the final form, Land's
algorithm is similar to homomorphic filtering, and relies on similar properties of
images. Homomorphic filtering enhances high frequencies and reduces low frequencies,
in the frequency domain. In the retinex algorithm, high frequencies are enhanced and
low frequencies are reduced in the spatial domain. The center/surround retinex uses
the following three steps to obtain a final relatively color constant image. First, for
each color channel independently, take the log of the value of each pixel. Second,
again for each color channel independently, subtract the result of convolving a local
surround function with the image from each pixel. Third, globally scale the resulting
values appropriately for image display. Much research has been done on which type
of surround function is best [33, 49, 31, 10, 42] and on the placement of the log
function. Jobson et al.  conclude that the Gaussian surround with the log taken
after the surround function instead of before provided the best results. However,
their interpretation of best seems distressingly free of any metric besides that of the
opinion of the researchers.
Moore et al.  designed and built an analog VLSI chip that performed the
retinex algorithm on real time video data, with good results. Their research demon-
strated a fundamental problem with the retinex algorithm, as it then was. In images
with large regions of a single color, the gray world assumption forces those regions
to gray, even when the actual color is highly saturated. They solved this problem
by introducing a variance compensation mechanism, which uses the local variance to
determine how much to change the overall color. Again, the metric for ;ood" and
"poor" results depends entirely on the opinion of the authors.
In 1986, Brainard and Wandell  published an analysis of the retinex theory
in terms of color constancy. Their concern was with color constancy that retains the
colors of objects independent of both nearby objects and illumination. They analyzed
the retinex algorithm developed by Land in 1983  in this context. Land and
McCann  concluded that the retinex performs similarly to humans when only the
spectral composition of the illuminant is considered. Brainard and Wandell's paper
deals specifically with the effect of the algorithm on the perceived colors of objects
close to the object in question. They determined that "retinex is not a color constant
algorithm and that it is not an adequate model of human performance." , p. 1657.
The retinex is not a color constant algorithm because it normalizes the photoreceptor
gains to values that depend on the input image, rather than to a constant. It is not
an adequate model of human performance because it depends on the surfaces present
in a scene to a greater degree than a human would. However, the authors present this
conclusion with only the assertion that "A human observer ... perceives virtually no
change in the appearance of the upper two rows of chips" , p. 1656 whereas the
algorithm produces noticeable change in the chips' appearance. They do not provide
or use any quantitative metric.
More recently, researchers [33, 49, 31] have developed a multi-scale retinex al-
gorithm, with better results. These papers detail the development and testing of a
multi-scale retinex which achieves both color/lightness rendition and dynamic range
compression simultaneously. They use a Gaussian surround for their kernel, and per-
form the log function after the convolution with the surround kernel. They also use
a canonical gain/offset. The multi-scale version is simply the summation of several
single-scale retinexes, each with a different standard deviation Gaussian surround.
This is one of the few papers that compares the performance of the retinex to the
performance of other algorithms.
The follow up to this study is the development of the multi-scale retinex with color
restoration . This algorithm adds a color restoration step after the multi-scale
retinex processing, and purports to produce images that closely mimic the human
viewing experience. The researchers have taken out a patent on their method 
which contains more explicit instructions for setting the various parameters.
2.4.2 Gamut Mapping
Forsyth's  work is the only other color constancy algorithm to approach the
performance of the retinex algorithm when used in conjunction with color indexing.
He presents two algorithms, Crule and Mwext, which are effective under two different
sets of conditions. Crule works for any surface reflectance if the photoreceptors are
narrowband. Mwext functions in the case where both surface reflectances and illumi-
nants are chosen from finite dimensional spaces. Experimental work with Crule shows
that for ;ood" constancy, "a color constancy system will need to adjust the gain of
the receptors it employs in a fashion analogous to adaptation in humans." , p. 5.
He uses 5 assumptions for the development of this algorithm:
1. All surfaces are flat, frontally presented, and there are no shadowing or mutual
2. There is a single, spatially uniform illuminant. Here, this means only that there
is only one illuminant at a time, not that there is only one type of illuminant.
The single, spatially uniform illuminant's chromaticity can be changed.
3. All surfaces are Lambertian and all reflection is diffuse. Surfaces vary only with
wavelength and do not fluoresce.
4. The problem he solves is defined in two parts: first, that the illuminant must
be estimated, and second, that the some statement about the properties of the
surfaces in the image must be obtained.
5. "The product of any surface reflectance function, and any illuminant function,
and any photoreceptor sensitivity can be integrated with respect to wavelength.
Surface reflectance functions are neither greater than one, nor less than zero.
These are very weak assumptions." p. 7
Obviously, he is not solving the real world problems of naturally illuminated ob-
jects. Even the simplest real environment continually violates assumptions 1 and 2.
The wings of birds and butterflies (and all specular reflections) violate assumption 3.
There are several additional assumptions about the character of the illuminant.
1. "Illuminants are 'reasonable.' p. 7 This means that, for instance, a sample
object seen under monochromatic light could not be reasonably used to identify
the object under white light. Also, "it must be possible to describe each member
of the set of illuminants that one observes (e.g. with a parameterization)." p. 7
2. Photoreceptors are also i .-.. iI .1!.," in that the illuminants must excite the
3. If the illuminant produces metamers (a pair of metamers are a pair of patches
whose underlying pigment is different but that evoke the same receptor response
under a given illuminant), there is no constraint on the algorithm to predict that
they will look different under different lighting.
4. The colors of the objects in the image are not inio .-. i illyy" distributed, and
there are "sul!l. iil different colors. Examples of an unreasonable distribution
of colors would include a forest scene (almost entirely green and brown shades)
or a scene viewed through colored glasses. According to this, a reasonable
distribution with sufficient different colors would include scenes of man-made
environments, such as a child's pl ivroom.
5. Photoreceptor outputs do not degrade substantially for deeply colored illumi-
nants ("For any illuminant, a reasonable measurement of the photoreceptor
outputs is possible." p.7)
These assumptions are generally valid in the real world, except possibly for as-
sumption 4. Metamers, while common in theory, are rare in fact. Assumption 5
is a restriction on the illuminants where color constancy is possible rather than a
restriction on the receptors.
The basic task of the algorithm is to recover the RGB descriptor of a given sur-
face under a canonical illuminant. The canonical gamut is a convex set containing
all possible RGB responses under the canonical illuminant. Using this gamut, and
the above constraints, some illuminants can be ruled out and the remainder are a
linear transform away from the canonical illuminant. Crule solves for all the diagonal
matrices of coefficients that take the gamut of the image into the canonical gamut.
Due to the computational expense of calculating these matrices, an approximate so-
lution was calculated. Diagonal matrices can approximate matrices in the feasible set
due to the non-overlapping narrowband sensor constraint. Cuboid approximations
determine the intersection of the image gamut and the canonical gamut. The feasible
set with the largest volume is chosen as the mapping most likely to achieve color con-
stancy. This algorithm provided good results for Mondrian images with a sufficiently
diverse selection of colors.
Fill] i,-, oi's  extension of Forsyth's algorithm relaxes the constraints on the
illuminant, the surface shape, and the specularities illuminantt assumptions, and
parts of assumptions 1 and 3 above.) The author shows that confounding factors in
real images (such as specularities and shape) affect the intensity but not the color
orientation. Therefore, the algorithm uses 2D perspective chromaticity coordinates
instead of 3D RGB coordinates, and the intensity is not recovered. The 2D space is
given by r = R/B and g = G/B. Theoretically, if B is small, values could become
noisy, but the author reports no difficulties with this in practice. He also implements
a canonical illuminant gamut constraint similar to Forsyth's canonical surface color
constraint. Again, tests show that the algorithm performs well. The results shown
in this paper indicates that the angular error on real, highly colored images under
standard daylight and tungsten illuminants is less than 10 degrees. This is only a
few degrees more inaccurate than the best possible with diagonal approximations to
Yet another approach based on the gamut mapping method, Barnard et al.'s 
algorithm identifies illumination variations across an image, and removes them. It also
uses the illumination information gathered to constrain the color constant solution.
Instead of a restriction on the reflectances, it requires sufficient variation in either
the reflectances, the illuminants, or in a combination of both. They interpret the
color constancy problem in the same way as Forsyth, taking images of scenes under
unknown illumination and determining the camera response to the same scene under
a known, canonical light.
2.4.3 Other Methods
Some researchers avoided color constancy by introducing a shape-based method,
using local color information. In 1995, Ennesser and Medioni  used the Where's
Waldo images to test their local color information adaptation of the histogram
indexing algorithm. Novak and Shafer  explicitly assumed the availability of a
color calibrator in the field of view of the camera. If you have something to calibrate,
the problem unquestionably becomes simpler, but we cannot assume a color chart
will be available or usable.
2.5 Fixing Histogram Indexing
In 1995, Funt and Fi-r Li-,v-ii  attempted to eliminate the need for a color
constancy preprocessor by incorporating illumination independence into the algo-
rithm. Instead of indexing on the photoreceptor class triples, they indexed on an
illumination-invariant set of color descriptors: the derivative (Laplacian) of the loga-
rithm of the colors. This is equivalent to indexing on the ratio of neighboring colors,
similar to Land and McCann's work on color constancy . This variation does not
work with saturated image pixels, because ratios based on those pixels are unlikely to
be constant. Their algorithm works almost as well as Swain and Ballard's under the
same lighting conditions. Their Laplacian of Gaussian operator identifies 19 objects
correctly out of 25 images of real objects under the original illuminant, but the toler-
ance is poor and two of the objects are v. i poorly matched" (ranks of 18 and 27).
Swain and Ballard's algorithm identifies 23 of these 25 real objects correctly, and the
two identified incorrectly were the second most likely choices. Next, the algorithms
are compared using synthetic Mondrian input. Both algorithms should show perfect
accuracy on this data when the illuminant is unchanged. When under differing, spa-
tially constant illuminants, Funt and Fi-r li- -, .'s algorithm correctly identifies all of
the objects, while Swain and Ballard's produces zero intersections for 155 of the 180
objects and correctly identifies only 20. When tested with illuminants that varied
spatially in intensity and color, again color constant color indexing (Funt and Fin-
l-.i--.'s method) produced perfect results while the original color indexing algorithm
identified only 7 of 30 correctly and failed on 12 of the Mondrians. Results on real
data were similar, with color constant color indexing producing one error with a rank
of 2 out of 11 objects under different illuminations. Color indexing alone in the best
case identified only 14 of 22 correctly, with 5 images having a rank of 2 and 3 having
a rank greater than 3.
Plainly, the color constant color indexing algorithm is better than color indexing
alone, as long as you can guarantee lighting shifts. If the lighting is the same, the
algorithm does not perform as well as the original.
All of the methods described here are more computationally intensive than the
original algorithm, and require much greater storage. C'! Ilpter 3 addresses the color
memory of the human visual system in terms of bit-depth, paving the way for a
detailed exploration of quantization methods in ('!i lpter 4.
Ps, !-. il !.1,ii. d experiments are the primary method available for characterizing
the visual processes of the human brain. The only other in i, jr method is the analysis
of case studies of patients with usual and unusual problems, ranging from simple
color-blindness to true color constancy
Both color constancy and color memory have been studied extensively. Unfortu-
nately, due to the time-consuming and tedious nature of the experiments, the number
of participants and the extent of the tests are often severely limited. In addition, the
variability of color perception between individuals is often high. Even for a single
individual, results from a given experiment will vary substantially from one time
to another. Background information on color constancy was presented in depth in
C'!i pter 2.
Various researchers have investigated color memory [60, 59, 61, 5, 3, 16, 14, 15,
41, 27, 46, 63]. Uchikawa et al. [60, 63] showed that for both single and multiple
colors, humans remember colors as members of the Berlin and Kay color categories
. Furthermore, instead of becoming less accurate as more colors were added to the
task, participants would forget one or more of five colors completely while retaining
roughly the same ability with the remaining colors.
Other research [62, 61, 59, 14] indicated that our ability to discriminate between
colors deteriorates with memory. However, none of these experiments was performed
in a color space easily transformed into the color coordinates used in a computer, or in
a robotic system. The research indicated that discrimination thresholds were larger
for successive viewing than for simultaneous viewing (both colors viewed simultane-
ously on the fovea). In addition, colors were remembered with a shift towards higher
saturation [59, 15]. There was no discernable shift of hue in memory [46, 15]. We
performed an experiment to determine a rough estimate of the accuracy of human
color memory in terms of bit-depth.
In general, p- vi-, .1 i--1i d1 experiments to determine characteristics of human color
vision are designed to obtain results that are as precise as possible, in whatever
coordinates are most convenient for the researcher's interpretation. These coordinates
include OSA uniform color scales [60, 63], wavelength , or the CIE spaces .
Many experiments  use a physical stimulus such as standardized colored chips,
which, except under certain fully controlled conditions, are impossible to perfectly
transform into a computer's representation of color. These experiments also generally
allow the subject to view the stimulus only for a short, fixed period of time, resulting in
a more controlled experiment but a less relevant assessment for real-world situations.
Because our algorithm is concerned with the smallest possible number of colors,
we want an optimistic estimate of the accuracy of human color memory. By finding
the largest realistic value for human color memory, we can determine a good target for
our robotic system. The purpose of this experiment was to determine this estimate of
the accuracy of human color memory under conditions close to those a person might
actually encounter in daily life.
3.1 Experimental methods
The experiment was performed on an Apple 400 MHz Rev. D iMac DVT, running
the Student Version of MATLAB 5.0. The overall luminance characteristic was used
to calibrate the monitor, and the luminance characteristic of each gun was used to
determine each gun's gamma. However, because the purpose of this experiment was
specifically to find an estimate of the accuracy of color memory in terms of computer
bit-depth, no compensation was incorporated for human perception of lightness or
saturation variation along the hue axis. Thus, if a shift in the hue value of a color
produced a shift in perceived lightness or saturation of that color, that shift was
considered part of the perturbation being tested. In order to maintain as much
realism in the structure of the experiment as possible, subjects were given as much
time as they wanted to determine their answers. In addition, they were also allowed to
use mental tricks to fix the color in their memories. The environment was austere (no
color charts) and they were not allowed to hold objects up to the computer monitor.
However, other purely mental games, such as comparing the color to a known color
such as the color associated with a sports team, were allowed. The experiments took
place in two locations, one in a room with windows and the blinds closed, and the
other in a room without windows. In both cases, the lights were alv--,v- in the same
configurations, with all overhead fluorescent lights on. However, in the first room
there was a small amount of additional light from cracks in the blinds. The subjects
were allowed to adjust the computer monitor and their position for the best viewing.
The only requirement was that the participants be able to see the colors clearly. In
each case, subjects were allowed to look away from the monitor whenever they liked.
In all likelihood, under normal circumstances humans would not perform as well on
color memory tasks as this experiment implies. But for our purposes, we wish to find
the most generous estimate.
First the subjects were asked to provide some baseline data. Using an interface
programmed in MATLAB, they indicated their choices for the best example of each
of the Berlin and Kay  color categories. This interface is shown in Figure 3.1. In this
figure, the primary window (Figure No. 1) shows the color plane. The participants
were assigned one focal color at a time. In this image, pink, red, orange and yellow
have been assigned, and the participant is selecting a color to represent green. The
participant was asked to use the mouse to choose the color that best represented the
given focal color. The color chosen was di- i'- '1 in the black box, labelled ;i, i1" in
this image. If the participant was unsatisfied with the color, they continued to select
colors until they were comfortable with the assignment. Once they were satisfied,
they clicked on the "Done" button. This reset the window to the next color and
assigned the chosen color to its space in the colormap on the right. This window,
entitled Figure No. 2 in Figure 3.1, showed the colors chosen so far. Different color
planes derived from the HLS and RGB spaces were available to the participant, but
the default was the Hue-Lightness plane. This plane, from Hue-Lightness-Saturation
space, was di-'1 'i, .1 with saturation fixed at the maximum. This space was chosen
because it uses a single axis to represent chromaticity, thus reducing the number of
trials necessary to characterize each participant's sensitivity.
Next, again using an interface programmed in MATLAB, they were asked to
provide the boundary information for the same plane, this time subdivided into seven
lightness levels. This is shown with the middle level in Figure 3.2. In this figure, the
participant has placed lines indicating where their boundaries between focal colors
occur, using the "Create Boundary Line" button and the mouse. The participant is
using the "Select Region Co l.-' menu to assign a color to one of the outlined regions.
Once all regions were assigned, the participant clicked the "Show Selection" button
to display the results of their mapping, using the focal colors chosen in the previous
step. This allowed the participant to make sure they allocated the focal colors to the
regions as intended. When the participant was satisfied with the boundary locations
and color regions, they clicked the "Lightness Done" button. This reset the window
with the next lightness level. This procedure was repeated until seven lightness levels
Once these preliminary tasks were over, the main experiment began. Three sets
of ten pairs of colors were generated. One of each pair was di'-1 i-', t first, and the
second (a perturbed version of the first color) was di- .11, i, 1 after a five second dl i-v.
For each set of ten pairs, one of the initial colors was the focal color chosen by the
participant and the other nine were offset from this focal color in RGB space by shifts
File Edit Window Resolution Help
Figure 3.1: Interface window for program enabling user to choose focal colors. Pri-
mary window shown here allows user to choose colors; secondary window shows focal
colors after user has moved on to next color
randomly generated from a Gaussian distribution with a .05 standard deviation. The
original RGB values ranged from zero to one. Values greater than one or less than
zero after the shift were rounded to one or zero, respectively. The resulting randomly
shifted points were checked to ensure there were no duplications within a set.
For each set of ten pairs, one d.1 -i, .1 color was not perturbed, and the rest were
perturbed by altering the value of the hue component of the color in question, flipping
the bit of the bit-depth being tested. This corresponded to a shift in one direction
or the other of some multiple of two. For example, if the program's bit-depth was
set to five (the default starting value) and the color's hue value was 10011000, then
the perturbed color would be 10001000, with the fifth least significant bit flipped,
corresponding to a shift of 16 along the hue axis.
In theory we could have simply added or subtracted a given number from a certain
color, and then swept that number from the smallest to the largest. In order to
reduce the search space, several constraints were introduced. First, to simplify the
transition to a robotic system, only integer bit-depth shifts were introduced. If we
accept the blue-yellow, red-green perceptual dichotomy, then an integer greater than
two in base-two bit-depth will .i.-- il- be capable of being interpreted in terms of these
Figure 3.2: Interface for user to choose boundaries between focal colors. For each of
seven levels of brightness, the user places lines where boundaries are perceived. Focal
colors are assigned to the regions between the lines by the user.
channels (eg. flipping the 5th least significant bit is equivalent to 16 regions, which
is equivalent to segmenting the space into 4 regions for each quandrant (blue-red,
blue-green, yellow-red, yellow-green) of the perceptual color space). Second, we used
the HLS space, rather than RGB or one of the CIE spaces. Again, this reduces our
search space, and thus the time necessary for each participant to donate. Instead of
having three axes to search (RGB) or two (CIE spaces) we have only one, hue. Eight
bits were allocated for each axis in HLS space. Therefore, the maximum possible
bit-depth used would be eight, which would change the value to the opposite hue.
Flipping the fifth least significant bit, as above, would correspond to shifting the
hue value by 24, or 16 out of 256. Third, we reduced the experimental search space
by a factor of two by doing only one shift per color. We could have presented two
trials for each color, one with the perturbation added and one with the perturbation
subtracted. Instead, in the interests of being able to test a larger number of colors,
we allowed only one perturbation per color.
The subject viewed each color on a black background, for as long as they felt
necessary. This initial display is shown in Figure 3.3, part (a). The color is shown
in a square in the center of the screen. The remainder of the screen is black, except
for instructions in white text in the upper right-hand corner. When the participant
was satisfied with their knowledge or perception of the color, they hit a key, which
tl i-.-i 1i. the computer to display 5 seconds of black and white temporal and spatial
noise over a square slightly larger than the color sample. A sample screen shot of the
noise is shown in Figure 3.3, part (b). Then the second color in the pair (perturbed
or not perturbed) was di-p1l ',- ,1, as in Figure 3.3 part (c), and again the subject
had as much time as they felt necessary to make up their minds whether they were
seeing the same color or a different color and indicating their choice to the computer.
After ten of these trials, the same ten color pairs were di-pl li '1 simultaneously, one
pair at a time, as nested squares (shown in Figure 3.3, part (d)). The subject was
asked whether they perceived one square of a single color or two nested squares of
different colors. Again, MATLAB was used to control the monitor and the subjects
were given as much time as they needed to come to a decision. This experiment was
created with the free Psi, 1. ..1 v-i. Toolbox  for MATLAB to display the colors
and the interval noise, and to record the keyboard responses.
Consensus in the literature [60, 46, 5, 63] is that human memory for colors is
worse than human instantaneous perception of colors. In signal detection terms, we
are attempting to determine the sensitivity of human observers to visual noise (color
perturbations) when a given delay is present. Consistent and inconsistent responses
were generated, as well as the usual hit rate and false alarm rate. This portion of
the experiment also identified the people who were careless in their responses: if
a subject saw two colors every time, they were obviously not p .iing close enough
Figure 3.3: Screen shots of p-i-, !. .,!1.i dil testing interface. (a) original color; (b)
intersample noise; (c) test color; (d) simultaneous comparison.
attention or mistaking the inverse delay response (the afterimage of the previous set
for a difference in the current set) for an actual difference in hue, as a priori there was
alv--,i- one pair of identical colors in each set of ten. The participants were unaware
of any information about how many pairs contained perturbed colors and how 111 Ir:
were not perturbed out of any given set.
Three sets of ten trials were performed for each focal color. If the subject per-
formed well enough on a given set of ten, the bit-depth was reduced by one (the
number of values shifted was reduced by a factor of two) for the next set. If the sub-
ject performed poorly enough, the bit-depth was increased. In order for the hue shift
to be reduced, the subject had to respond to at least 7i1' of the trials consistently
(simultaneous and successive responses were the same). To increase the hue shift, the
subject had to provide inconsistent responses (meaning the simultaneous and succes-
sive responses differed) on at least 71i '- of the trials. Breaks were scheduled every 20
minutes, and many people chose to continue another d4-- rather than continue the
session after the first break. The preliminary data-gathering happened each time a
subject started a session. Thus, if someone chose to continue the experiment on a
new day, they would select new focal colors and new boundaries as before, and those
colors would be used in the new session. However, if the subject chose to continue
after the break, new focal colors and boundaries were not generated. As a result, the
colors used were representative of that subject's focal colors on that day, under those
lighting conditions. If the subject was willing to continue after the trials for the eight
chromatic focal colors, the same procedure was carried out for the boundary colors
chosen at the central lightness value.
3.2 Data analysis
Twenty-two people participated in this experiment. This resulted in a total of
5885 signal trials and 635 noise trials. A signal trial is defined as a single pair of
colors whose hues differed and which had valid responses from the participant. A
noise trial is defined as a single pair of colors whose hues did not differ and which also
had valid responses from the participant. If the participant gave a response that was
not one of the acceptable responses ( iii -" or "Ill, il for the d, 1i, ,1 response,
'1" or "2" for the simultaneous response) the trial was discarded.
Figure 3.4 shows the focal colors chosen by the participants in terms of hue (black
x's) and the actual initial hues used for the fifth LSB testing (red). The focal colors
are concentrated in the left side of the plot, corresponding to red, oranges, yellows
and browns. The remaining colors (green, blue and purple) are slightly clustered
in groups in the remainder of the hue space. The testing data covers the hue axis
well, with more representation in the regions with clustered focal colors. The y-axis
shows the number of responses obtained for each focal color/test color. Each hue was
rounded to a number between 0 and 255, resulting in multiple trials for most hues.
x x Focal Colors
Initial Displayed Colors
200 250 300
Figure 3.4: Focal colors and initially di-,l ,1, -1 colors at LSB 5.
Table 3.1 shows the number of hits and the number of signal trials. The hit rate
is calculated by taking percentage of signal trials that were hits. Z(hit rate) is used
to determine the value of the sensitivity metric. Table 3.2 shows the same data for
the false alarms.
These tables indicate clearly that many people were able to reliably identify shifts
in color when the fifth LSB was flipped, but that few were good enough to progress
to the stage where the third LSB was flipped. Very few did so badly that they got to
the sixth LSB, but on determining this, additional trials were run at both the sixth
and seventh to compensate. Only one person made any errors at the seventh LSB.
Simply looking at these results would cause one to hypothesize that a good sensitivity
threshold would lie somewhere between the fourth and sixth LSB.
Xx x x
X- A A -nA
5r~ ?3~nrvJ~i~pin .
Table 3.1: Number of trials given bit-depth
Bit-depth Signal Trials Hits Hit Rate Z(Hit Rate)
3 428 248 0.579439252 0.200459453
4 1477 814 0.551117129 0.128484317
5 3340 1812 0.54251497 0.106771267
6 543 449 0.>. -,i7661 0.941936378
7 97 95 0.979381443 2.041142579
Table 3.2: Number of trials given bit-depth
Bit-depth Noise Trials False Alarms F.A. Rate Z(F.A. Rate)
3 42 19 0.452 -V '152 -0.119648575
4 153 44 0.287581699 -0.560462468
5 330 92 0.278787879 -0.586446731
6 97 10 0.103092784 -1.264124876
7 13 0 0 -oc
Sensitivity analyses were performed on the data to determine roughly how sensitive
humans are to bit-depth variation. The two sensitivity metrics used produced similar
Two metrics derived from signal detection theory produce very similar results. A
parametric sensitivity measure, known as the d' metric , and a non-parametric
sensitivity measure, known as the p(A) metric, are presented.
The receiver operator characteristics (ROC) curve is helpful in visualizing the d'
metric. The ROC curve is generated by plotting the hit rate against the false alarm
rate. The hit rate is the probability that an observer will generate a true positive,
given that the two colors are different. The false alarm rate is the probability that
an observer will generate a false negative given that the two colors are the same. An
observer who is simply guessing will produce a hit rate equal to the false alarm rate.
n 0 .5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
False Alarm Rate
Figure 3.5: Sample receiver operator characteristics curve.
An observer who can discriminate perfectly will have no false alarms and a hit rate
The d' metric takes a [hit rate, false alarm rate] point, extrapolates a Gaussian
function, and looks at the difference between the means assuming both Gaussians have
the same variance. A constant d' produces a curve like the dashed line in Figure 3.5.
The p(A) metric looks at the same point and interpolates linearly between the end
points of the ROC curve and finds the area under the interpolated curve.
The d' metric measures sensitivity as the separation between the means of the
distribution of the signal trials (there is a difference between the two colors) and the
noise trials (there is no difference between the two colors). First, the distance of the
observer's criterion from the means of the two distributions is calculated. This is done
by converting the hit and false alarm rates to z scores, as shown in Tables 3.1 and 3.2.
The hit rate is the probability that the observer said the stimuli were different with
Table 3.3: Results of the d' and p(A) analyses.
Bit-depth d' p(A)
3 0.3192 0.5634
4 0.6889 0.6317
5 0.6931 0.6319
6 2.2061 0.8619
7 00 0.9897
the delay, given that the observer viewed stimuli that were different. The false alarm
rate is the probability that the observer said the stimuli were different with the d 1 iv,
given that the stimuli were the same. The data are mapped to a Gaussian probability
distribution, and the z score gives the distances desired. The d' measure is computed
by taking the difference between the z score for the hits and the z score for the false
alarms, as shown below.
d' = z(hit rate) z(false alarm rate) (3.1)
This is equivalent to calculating
d' = P- (3.2)
where ps is the mean of the signal trials and PN is the mean of the noise trials.
The lower limit for d' is zero, which indicates no ability to discriminate between
signal trials (different colors with delay) and noise trials (same colors with delay).
This is indicated as the dotted line in Figure 3.5. The theoretical maximum is oo,
corresponding to perfect ability to discriminate and thus no false alarms, represented
by the point in the upper left corner of the plot in Figure 3.5. The usual threshold for
ability to discriminate is d' of one, plotted as a red dashed line. Table 3.3 shows our
experimental results for humans' overall ability to determine the presence of color
perturbations as a function of bit-depth. Clearly, the threshold of one is passed
between the fifth and sixth least significant bits. Using linear interpolation, we find
that the threshold is crossed at just over fourteen bins.
The p(A) metric is computed by finding the area under the curve denoted by
a linear interpolation from [0,0] to [1,1] through a given [hit rate, false alarm rate]
point. This corresponds to
p(A) ( ) + h( f) + ((- f)(1 h) (3.3)
When the d' metric is 0, p(A) is 0.5. When the d' metric is oo, p(A) is 1.0. The typical
threshold used with the p(A) metric is 0.75. The advantage of this metric, as opposed
to the d' metric, is that the p(A) metric is bounded. If we plot the results of this
calculation for our data as a function of bit-depth, as shown in Table 3.3, it is clear
that the same pattern as that shown in the d' results is present. Again, the results are
almost identical for the 5th, and 4th least significant bits, and increase dramatically
for the 6th LSB. Clearly the threshold lies between the 5th and 6th least significant
bits, corresponding to hue shifts of between 8 and 16 out of 256. This corresponds to
a bit-depth of between 3 and 4. A bit-depth of 3 corresponds to 8 categories, which
is the result if the threshold is set to the 6th LSB. A bit-depth of 4 corresponds to
16 categories, which is the result if the threshold is set to the 5th LSB. Using linear
interpolation, we find that the p(A) threshold is crossed at 12 bins.
Using binomial analyses1, we find that the proportion of correct responses for the
4th, 5th and 6th least significant bits are .566 .031, .558 .017 and .838 .031
respectively at a 95'. confidence interval. The 4th and 6th least significant bits are
significantly different (p < .05), as are the 5th and 6th least significant bits. However,
the 4th and 5th least significant bits do not differ significantly. This supports the d'
and p(A) results.
1Analyses performed by Dr. Keith D. White
Table 3.4: Gamma Calibration Data
Level Ci -, .. Red Green Blue
1.0 7.5 1.950 5.60 0.740
0.75 3.8 0.375 1.20 0.195
0.5 1.5 1.000 3.00 0.420
0.25 0.4 0.093 0.24 0.069
The monitor calibration consisted of determining the luminance of the screen for
different RGB combinations. Table 3.4 shows the results at four different luminance
levels. The gammas computed from these levels are 2.1144 for the overall luminance,
2.1958 for the red channel, 2.2722 for the green channel, and 1.7114 for the blue
The actual change in hue viewed by the participants corresponding to each possible
hue value was determined using the following procedure. Saturation is first defined as
255 and lightness as 192. For each possible hue, the HLS coordinate and the perturbed
coordinate (for each bit-depth from 3 to 7) are first converted to computer RGB
coordinates. These values are converted to relative luminances using the gammas
computed above. The relative luminances are scaled by the appropriate maximum
value for each channel corresponding to the results for level 1 (lOii'.) in Table 3.4.
These normalized RGB luminance values are converted back to HLS space. The
output of the procedure is the absolute difference between the two resulting hue
Figure 3.6 shows the results of this procedure for each bit-depth/number of bins.
The desired values, the differences expected when the axis is uniformly divided into
the appropriate numbers of bins, are shown as dotted lines. The solid lines show the
change in hue as produced by the monitor. The average values of the solid lines are
0.25 I- 64 Des
S0.2 16- Des
0.15 4 Des
50 100 150 200 250
Figure 3.6: Monitor Calibration Results
extremely close to the desired values, differing by less than 1 The numbers in the
legend refer to the number of bins corresponding to the given bit-depth.
Figure 3.7 shows the results when the actual shifts in hue after monitor calibration
are used to determine a p(A) value for each range of shifts. First, histograms of the
amount of shift for each trial were generated from the calibration results. Each bin
corresponded to a 0.01 shift in hue using the y-axis in Figure 3.6. Any histogram
bins containing no signal trials or no noise trials were eliminated. The remaining bins
were used to calculate the hit rate and false alarm rate for each bin. The resulting
values from each bin were used to determine the p(A) result shown in Figure 3.7.
The p(A) results from Section 3.2.1 are plotted in green, and clearly correspond well
to the sweep results, even crossing the threshold at the same number of bins. The
d' threshold was 14 bins, corresponding well to the p(A) threshold of 12 bins. This
p(A) Sweep Results
0.95 p(A) Quantized Results
0 10 20 30 40 50 60 70 80 90 100
Number of Bins
Figure 3.7: Results of p(A) analysis with actual change derived from monitor calibra-
supports our conclusion in Section 3.2.1, that the threshold for hue discrimination
with a five second delay lies between the fifth and sixth least significant bit.
The implication is that human color memory has a bit-depth of between 3 and 4.
Use of two metrics shows a significant difference in sensitivity between the 5th and
6th least significant bits. In general, we can only reliably remember colors as members
of between 8 and 16 categories. This corresponds nicely to simple gradations of the
four perceptual chromatic channels, where a color is remembered as being on one
side or another of a given channel. For example, it would be possible to remember
a color as being on the blue side of green, rather than on the yellow side of green,
but distinguishing between a remembered green blue-green and a remembered blue
blue-green would be difficult.
Subjects reported that by the end of the 20 minute session, they were beginning
to lose track of what color they were looking at, and at the end of five seconds would
have forgotten whether they had just seen a green or a yellow patch. In addition,
virtually all participants reported using mental tricks to remember a specific color.
For example, one person said that he remembered the oranges by associating them
with sports teams ("This is Tennessee o01 ,i, ). In the real world, people rarely need
to remember the precise color of an object. They simply assign it to a category and
use other cues and a priori information to recognize the object ("I have only one
orange mug; I am in my house; I am looking at an orange mug; therefore this must
be my orange mug.") If the precise color of an object is important, humans do not
trust to color memory alone. Therefore, given the intense and specific nature of the
experimental task, it is unlikely that humans are even this precise in remembering
the colors of real objects under real world conditions. So one question remains: Is
this color segmentation sufficient for color object recognition? If so, separation of the
hue axis into eight chromatic color categories should be sufficient.
Color indexing is a very powerful and robust object recognition algorithm, suf-
fering from one problem: it is not robust to illumination changes. Various methods
attempt to compensate for this without losing the algorithm's inherent robustness
to occlusion, orientation, and scaling, without marked success. The color constancy
algorithms with the best results were detailed in C'! lpter 2, although they are cur-
rently insufficient for adequate object recognition. Color constant color indexing 
is promising, but requires elimination of saturated pixels for consistently good results.
Any color constancy preprocessor improves in accuracy compared to the original al-
gorithm alone. Unfortunately, this does not improve recognition accuracy sufficiently
for a truly robust object recognition system.
The algorithm does not currently incorporate any version of color memory. Having
shown that color memory has a dramatic effect on human resolution of hue and hue
perception over time, we would like to incorporate it into the color indexing algorithm.
If the human system can be taken as a high-level model of the processing occurring
in the color object recognition task, then the effect of memory should certainly be
taken into account. How can that be done?
Quantization is the engineering equivalent of the color categorization that takes
place in memory. In some v--v-;, our computers already have lossy memory for color
images, as many of our video compression algorithms involve quantization to 1r Ilnr
fewer colors than our di-pli-1 are capable of reproducing. In order to incorporate
the effects of color memory into the algorithm, we need to incorporate quantization.
Clearly, if we reduce the number of histogram bins, we will make the algorithm
substantially more efficient.
Unfortunately, nothing is known about how the human visual system actually
determines given categories, or how a given perceived color is assigned to a given
category. Therefore, we will have to derive insight on quantization methods from
Most quantization algorithms to date have been concerned primarily with one of
two cases. First, in the case of monitor di-pl 'i1 with limited resources, implementa-
tions are concerned with a 256-color limitation of di-p1 w devices, rather than with an
8 or 16 color limitation. The goal of these algorithms is to display an image as close as
possible to the original image. Ideally, a human viewer should find the output image
indistinguishable from the original. In the second case, quantization is used to store
or transmit the image as efficiently as possible. Again, the desired output is assumed
to be as close as possible to an observer's perception of the original. Time-dependent
effects of the human visual system, such as chromatic adaptation and degradation of
the image due to memory are explicitly ignored because the purpose of this quanti-
zation is to allow the human observer to view the image independent of these effects.
Because of this goal, a second assumption inherent in all these quantization methods
is that comparison to the original image is the best measure of the method's im-
pact. This means that generally, quantization itself is seen as introducing noise and
degrading the image.
In our case, the best measure of the effect of the algorithm is comparison to other
images of the same scene. The original image contains noise in the pixel values.
A second image of the same view will generally contain different (although similar)
values. If the lighting is unchanged, a quantization algorithm that is effective for these
purposes should map the images to the same result, rather than to different results.
Comparison to the original image will give a sense of the degradation to the image
caused by the algorithm. Comparison to different images, rather than comparison of
storage or display constraints, should provide the metric for the effectiveness of the
Figure 4.1: Original image used to demonstrate results of different quantization
quantization algorithm. Instead, we will measure an algorithm by its effectiveness
when used in the context of the object recognition algorithm. A good algorithm
will produce good object recognition results, and a poor one will not identify objects
correctly. In the same way as color constancy algorithms, simply deeming a result
;ood" is insufficient. Object recognition accuracy provides an excellent metric for
determining the ability of an algorithm to compensate for lighting variation, but
viewing the quantized images can be helpful in understanding the mistakes made by
the algorithm. Figure 4.1 shows the original image used to display the results of the
different quantization algorithms.
4.1 Overview of Algorithms
4.1.1 Uniform Quantization
This is the simplest of the quantization algorithms. The color space is divided up
into n blocks of equal volume. The centroid of each block is the color used in the color
palette. The color palette is therefore fixed, data-independent, and takes no notice of
Truck Image: Uniform Quantization in RGB Space to 8 Levels
;. % .' .
_ r. .. -.
Figure 4.2: Results of uniform quantization
the combinations that humans may find more pleasing. Generally this algorithm is
considered to produce poor results from a human standpoint. Figure 4.2 shows the
results of uniform quantization to 8 colors.
This process is used to improve the results of quantization for human viewing.
A small amount of noise is added to each pixel of the image before quantization
occurs. This improves the quantized image (for a human) by eliminating 1' ,lin_:
the effect that occurs when a smooth transition between two colors is replaced by
distinct bands of colors from the palette. However, for object recognition purposes,
and for low numbers of colors, banding may be useful. At very low numbers of colors,
dithering will have very little effect, as the effect of low numbers of colors is increased
resistance to noise. The results of dithering are shown in Figure 4.3.
Truck Image: Uniform Quantization in RGB Space to 8 Levels With Dithering
Figure 4.3: Results of dithering on uniform quantization.
4.1.3 Modified Uniform Quantization
In this algorithm, instead of dividing the space evenly into n blocks of equal
volume in three dimensions, the n blocks are either cubic or rectangular, with an
integer number of them to a side. The process can be seen simply in two dimensions.
If n 5, we start with a 2x2 array. Then one block is added in one row, to give five
blocks from one row of two and one row of three. If n = 6 we have a 2x3 array. If
n = 7, one block is added to one column, yielding two columns with two blocks, and
one with three. For n 8, there would be one column with two blocks and two with
three, and for n = 9 we would have a 3x3 array. In this way, the largest difference
between the maximum and minimum number of blocks in any single dimension is
one, and the number of blocks in all other dimensions have neither a maximum nor a
minimum. This is a data-independent method for simple quantization to few colors.
The order in which the axis along which the variable block size occurs can be chosen
Truck Image: Modified Uniform Quantization in RGB Space to 9 Levels
Figure 4.4: Results of modified uniform quantization
in advance to fit the data best, making this a data-dependent method, or it can be
determined in advance and fixed, making this a data-independent method. In RGB
space, modified uniform quantization reduces to uniform quantization for n3 colors,
when n is an integer. Quantization to 9 colors is shown in Figure 4.4.
4.1.4 Tree Quantization
In this quantization scheme, the RGB values are first converted to the Hue-
Lightness-Saturation space, covered in detail in Appendix A. The advantage of this
space is that most meaningful color demarcations (chromatic/achromatic; light/dark;
dark color/black; light color/white) can be made with simple threshold operations.
Separating the color space into chromatic and achromatic regions can be acheived
by thresholding the saturation and lightness axes. Saturation of one represents max-
imal saturation for the entire length of the lightness axis. One threshold on the
saturation axis and two on the lightness axis determine a ring of chromatic colors.
Values above the upper lightness threshold are assigned to the lightest achromatic
value. Colors whose lightness is below the lower lightness threshold are assigned to
the darkest achromatic value, regardless of saturation. All colors whose saturation
is below the saturation threshold are assigned to achromatic values based on their
The first division occurs between achromatic and chromatic. Then the achromatic
region is uniformly quantized along the lightness axis to A regions. The chromatic
region is uniformly quantized in the hue dimension to C regions. All chromatic
values are arbitrarily assigned to a lightness of 0.55 and saturation of 0.6, while all
achromatic values are assigned zero saturation and hue. Both A and C are set by the
user, and A + C is the total number of colors in the final color map. For example, if
A = 0 and C = 2, the final map would contain two colors, diametrically opposed in
hue. If A = 1 and C = 1, the final map would contain medium gray and saturated
This method is shown in Figure 4.5, for 11 chromatic regions and three achromatic
regions. The image is first converted to HLS space. A saturation threshold of 0.15
determines whether each pixel is chromatic or achromatic. If the pixel is achromatic
(saturation below 0.15), it is put into one of three achromatic bins. Below a lightness
of 0.2, the pixel is put in the black bin. Above a lightness of 0.9, the pixel is put in the
white bin. Between the values, the pixel is assumed gray. If the pixel is chromatic,
it is placed in one of n chromatic bins. The boundaries for the chromatic bins are
determined by uniformly quantizing the hue axis into n regions. In this figure, there
are 11 chromatic regions, and the hue axis of the HLS space is offset by 60 degrees,
placing pink at both 0 and 1.
Saturation 0 015 1
0.0 Black 0.000 Red
SGray 0091 Orange
Figure 4.5: Diagram for tree quantization.
4.1.5 Median-Cut Quantization
This is a data-dependent method using the median of the values of the three
channels in the input image, given a colormap. Assume n is the number of colors
desired in the final colormap. The median values of the input map are used to split
the colorspace into two new boxes. The limits of these boxes are used to find the
median values of the new colormap in RGB space, which are in turn used to split
the current boxes into new boxes along the largest axis, and so on until n boxes have
been generated. The average of the colors in each of these n boxes are the values
in the new colormap. Figure 4.6 shows the results of this algorithm, again for eight
4.1.6 Vector Quantization
Vector quantization is a general term for data-dependent quantization methods.
The color palette is generated via an iterative procedure, similarly to median-cut
Truck Image: Median Cut Quantization in RGB Space to 8 Levels
Figure 4.6: Results of median-cut quantization
quantization above. These algorithms are even more time-consuming to run, but
produce subjectively good results. One possible implementation follows these steps.
First the most common color is assigned to the palette, and the error is evaluated
(using mean square error or another method of choice). Then the next most common
color is assigned to the palette, and the remaining colors are assigned to one of the
two values so as to minimize the error. This process is repeated until n colors are
4.2 Number of Colors
Generally, the optimal case for human viewing is where quantization is unneces-
sary, so computer vision research tends to assume a fixed maximum number of colors
(usually 256), and generates algorithms that produce the most pleasing results. We
have not found any algorithms in the literature on choosing this maximum by any
methods other than an analysis of the transmission times or storage constraints for
the image data, or by using the maximum number of colors a display can show. These
are important, but not particularly helpful for this research. Our assumption is that
the necessary storage can be added to the robot or computer, as we will be reducing
the data to substantially fewer than 256 colors. One constraint of interest to us may
be the time necessary to perform the quantization. Another possible metric would
consider more closely the task we are asking quantization to perform.
4.2.1 Optimizing Accuracy
The previous techniques are all designed to minimize artifacts visible to the human
viewer. However, for our purposes, the human viewer is irrelevant. Instead, we need
a metric that will enable the system to correctly classify colors even when the lighting
changes. Because the algorithm is primarily concerned with chromatic responses, and
for speed and simplicity, only the hue axis is considered.
The accuracy sweep method is a simple, greedy approximation to the brute force
method of trying all possible combinations to obtain the bin locations that produce
the lowest possible accuracy on the training data.
The first two bins locations are found by sweeping all possible combinations of bin
boundaries along the hue axis. There are 256 possible locations for the first bin and
255 for the second because boundaries at the same location would effectively produce
only one bin. Once the first two bins have been determined, they are fixed and the
third bin boundary is found by calculating the accuracy on the training data when
the third bin boundary is swept through all 254 possible locations. Then the third
bin boundary is fixed and the fourth is swept, and so on up to the nth bin boundary,
producing the desired n 1 bins.
Again, here we assume we know how many bins are desired. We want the quanti-
zation algorithm to free us from needing a color constancy preprocessor. How ri ',i-:
colors are necessary if we wish each bin to contain most of the hues that result from
images of a single color under a variety of lighting changes? To determine this, we
gathered data of different colors at different times of div and analyzed the results.
4.2.2 Lighting Shifts
Images of a color calibrator and a set of cans were taken under a variety of con-
ditions. Data for 2 different lighting conditions was taken at every hour, starting at
9 am and finishing after dark, with an additional image at 6:20 pm (at dusk). The
lighting conditions consisted of the four permutations of the status of the camera's
white balance (on or off) and the status of the blinds (open or closed). The resulting
images were segmented into color patches representative of different colors. A Gretag
Macbeth ColorC'l ., i,~ was used as the calibrator, and a set of cans representative
of common colors used in soda can label design were also present in the images. The
layout was unchanged over the course of the d4v-. The color calibrator consisted of 24
different colors, including 6 achromatic colors. The remainder of the image produced
another 24 sample colors, including both common representations of black and white
and dark background regions, such as the shadow under the table where the cans and
the calibrator were set up. Figure 4.7 shows the range of colors that a single white
region of a single can took over the course of the d4-v.
This data was analyzed to determine the scope of possible lighting shifts, and po-
tentially to determine a likely function for mapping colors taken under one illuminant
to those taken under another.
The range of values for a single patch of color, under a single lighting condition,
was much higher than anticipated. Generally, along the hue axis, a single patch
would have a standard deviation below .05 where the original data are scaled to have
potential values from 0 to 1. To encompass one standard deviation to each side of
the mean would require a bin size of roughly 28 out of 256 possible bins. Thus, only
9am 10am 11am 12pm
10 10 10 10
1515 15 15
2 4 6 2 4 6 2 4 6 2 4 6
1 pm 2pm 3pm 4pm
15 15 15 15
2 4 6 2 4 6 2 4 6 2 4 6
5 pm 6 pm 6:20 pm 7 pm
10 10 10 10
15 15 15 15
2 4 6 2 4 6 2 4 6 2 4 6
Figure 4.7: Sample color changes under a single d-i-'s sunlight.
8 to 10 bins along the hue axis would be possible. However, this small number is very
close to what we know of the accuracy of human memory. Instantaneously, we average
over a given spatial region to get an estimate of the local color. Over time, we do not
store the precise color, but only a rough representation of it. Strangely enough, the
standard deviation along the hue axis when all the pixels of a given color over the
course of a d4v- were included produced a similar maximum. From the variability of
the data, it appears that 8 to 10 bins is a very good approximation to the accuracy
of the input data.
A graph of these results for the set of colors found in soda cans is shown in
Figure 4.8. This graph shows the mean and standard deviation for chromatic (blue)
and achromatic (black) colors. This plot is for the soda can colors, rather than the
calibrator colors. The achromatic colors included black, the dark background color,
the desk the cans were placed on, the middle brightness background color, silver,
Mean and standard deviation for hue axis
> 0.5 -
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 4.8: Average and standard deviation for hue under varying lighting. The x
location of each bar indicates the average of the average values for the d4iv.
white, and the lightest background color. The desk was closest to the gold or orange
hue. Black, silver and white were all colors taken from regions of cans. Clearly the
hue axis is a good choice for quantizing chromatic regions, as the standard deviation
for chromatic colors is very low and the separation between colors is reasonably good.
Figure 4.9 shows the difference between the averages for the cans (lower row) and
the calibrator (upper row) hue results. These data points clearly seem to cluster in
along the hue axis. The region between 0.3 and 0.4 is almost empty, while the region
including values just below 1.0 and values from 0.0 to 0.2 is very crowded. From
0.5 to 0.7, there are three distributed can classes, but the calibrator classes are more
tightly clustered without such clear divisions between colors.
If we assume that the mean and standard deviation calculated over the course of
the d iv are representative of Gaussian distributions, we can derive a B ,-.- classifier
x Cal, Chrom.
x Cal, Achr.
2 x x x x xx >X< xx x x >xx x x xx x -
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 4.9: Mean values for cans and calibrator hues.
to determine the best color to choose for a given HSV triple. Figure 4.10 shows the
probability density functions for the chromatic can colors along the hue axis. It is
assumed that each color is equally likely. The color of each line is determined by the
average RGB coordinates for that color for the d4 Clearly, the hue axis is separa-
ble into distinct hue categories. Multiple peaks are present in these regions because
the 17 chromatic can colors include more than one representative of important col-
ors. For example, a light yellow and a darker yellow were both taken from images of
the Country Time LemonadeTM can. Similarly, there were two purple measurements
(Welch's Grape Drink ), two cyan measurements (FrescaTM) and two pink measure-
ments (Diet Cranberry Canada Dry Ginger AleTM and TabTM). Gold (Caffeine-Free
Coca-Cola TM) seems to have a separate distribution from yellow, while many colors
overlap in the red region of the hue axis. Natural separations fall between yellow and
Comparison of mean and standard deviation of can chromaticities
Probability density functions for chromatic can colors along the hue
green, green and blue, blue and purple, purple and reds, reds and yellow. The reds
class contains not only red and dark red, but also both pinks and both oranges.
In addition, the achromatic colors are separable on the other two axes. Figure 4.11
shows the probability density functions for all the can colors along the saturation axis.
Clearly, the light achromatic colors are much lower in saturation than the others.
The dark achromatic colors, however, are just as saturated as the lighter chromatic
colors. Figure 4.12 shows that the dark achromatic colors can be distinguished with
a threshold on the value axis.
Using these probability density functions we can derive a classifier with 8 or 10
colors. For the 8-color case, gold and yellow are considered one category, and red
contains red and dark red, both pinks, and both oranges. The 10-color classifier
separates gold and yellow into 2 classes, and finds a separate category for pinks. The
Comparison of mean and standard deviation of can saturations
Figure 4.11: Probability density functions for all
can colors along the saturation
eight color regions common to both classifiers are black, white/silver, red, yellow,
green, cyan, blue, and purple. We determine that black is present when the value
coordinate is below 28 on a scale of 1 to 256. White and silver are grouped into a
single class, identified by saturation of less than 95. The remainder of the classes are
defined on the hue axis. Reds are in the range below 18 and above 206, while yellow
and gold lie in the 18 to 47 range. Greens are present from 47 to 92, cyans from 92 to
144, and blues from 144 to 164. Purples fill the hole from 164 (blues) to 206 (reds).
The calibrator had a magenta patch that formed a clear peak between the purple and
red clusters, but magenta is not well represented among can label colors, so magenta
is included in the purple category.
The sample image quantized with the 8-color scheme is shown in Figure 4.14.
The wheels and pavement are categorized as achromatic/light and the shadows are
Comparison of mean and standard deviation of can values
0 50 100 150 200 250
Figure 4.12: Probability density functions for all can colors along the value axis.
categorized as blue and cyan, as they are not dark enough to fit in the black category
used by the soda cans. This is primarily because the truck image is a generic image
included with the Adobe Photoshop package, for use in tutorials, and the soda
can images were taken with a Sony Digital8 camera in the laboratory. Images of
soda cans, quantized with this method, assign black regions on the cans to the black
The two additional regions in the ten-color scheme consist of gold, from 18 to
31 along the hue axis, leaving yellow from 31 to 47, and pink. Figure 4.13 shows
the probability density functions along each axis for the colors included in the red
category. Clearly, one color is separable along the value axis, and one is separable
along the saturation axis. These two colors are the two pinks. The two pinks are
put in a tenth category, consisting of two subdivisions of the red category. The pink
category is defined as those colors that fit in the red category along the hue axis,
Red PDF's Along Saturation Axis
50 100 150 200 250 0 50 100 150 200 250
Red PDF's Along Value Axis
50 100 150 200 250
Figure 4.13: Probability density functions for the colors in the red category.
whose saturation is below 163 (and above the white/silver saturation boundary), or
whose value is above 123, or both.
The main problem with quantization implementations is that of the final goal of
the quantization. The intent of most algorithms is maintenance of the image in terms
of instantaneous human perception, while incorporating reduced storage and display
constraints. This means that the maximum possible number of colors is used. Metrics
used to rate images and techniques include bit-rate (in coding) and quality in terms
of human perception. My algorithm is concerned with using the minimum possible
number of colors to represent an image while maintaining the ability to recognize
an object, not the maximum available while maintaining human perception of the
image. Because the data stored are in the form of a series of histogram values, not
Red PDF's Along Hue Axis
Lighting Shift Derived Quantization
Figure 4.14: Sample image quantized using the 8-color lighting-shift based classifier.
in the form of an image, bit-rate values for an image are not a useful metric. This
makes it very difficult to use the research in the literature to analyze an algorithm's
suitability for this project. We take the approach that the degradation in color
representation introduced by quantizing an image of an object can actually improve
the desired result: namely identifying the same object in multiple images. Generally,
quantization algorithms are judged qualitatively. The field is in agreement that mean
square error is a poor metric of image quality, and no other metric has been proposed
with sufficient acceptance to provide a clear quantitative measure of image quality.
Analysis of data gathered under typical laboratory lighting conditions resulted
in a plausible quantization scheme. In addition, this ain i ,~i-; provided a clearer
understanding of the nature of the problem. The variance in lightness or value is
high, rendering that feature unlikely to be helpful except in the broadest sense. The
segmentation into different color categories is not intuitive in the RGB space, while it
is very straightforward using hue and saturation as salient features of the data. This
resulted in eight or ten color categories, close to the number of categories resulting
from the work in '! Ilpter 3.
Little work has been done for cases of extreme quantization for object recognition.
Generally, color quantization has focused almost exclusively on compression with
minimal loss in quality with regard to instantaneous human image perception. In this
case, quantization is usually regarded as introducing noise, rather than as reducing
it. There are a few exceptions to this rule [17, 6]. They attempt to compensate
for non-linearities in the data by transforming the original data to a new space,
via the K-L transform or a DCT, and then uniformly quantizing, rather than by
using an adaptive quantization algorithm. Because they are doing this with respect
to Swain and Ballard's algorithm, their results are of more interest here than the
other results. In general, they have found that using these transforms combined
with uniform quantization resulted in high accuracy that dropped off sharply when
the number of bins fell below eight. The resulting number of categories is similar,
regardless of how the number is obtained.
THEORY AND RESULTS ON SYNTHETIC DATA
5.1.1 Structure and Methods
The color histogram indexing method derives much of its robustness from its
simplicity. The feature vectors consist of normalized color histograms of images of
the objects in the database. In our implementation, we determine match closeness
by a simple Euclidean distance measure between feature vectors. We have variables
for the number of objects in the database, the number of colors in the original space,
the number of colors in the final space, and the number of values used to form the
histogram. For example, suppose the final histogram contained 512 values. In general,
only the few largest have object-related information in them. The smallest values
are very noisy. Instead of keeping all 512 values, we could keep only a few of the
largest, and set the rest to zero. This would greatly increase the storage efficiency
of the system, as only the non-zero values and their index would need to be stored.
Because of storage and processing constraints, the preferable option would be to
simply quantize the space to only a few colors in the first place. If fewer colors are
used, only the magnitudes need be stored in an ordered array, rather than a more
complex structure incorporating the index values and eliminating the zeros.
So how many color categories are sufficient? We have shown that human memory
for hues degrades between 8 and 16 divisions of the hue axis  in C'! plter 3. Drew
et al.  and Berens et al.  found accuracy began to degrade at between 8
and 16 categories in their respective transformed spaces. Analyzing the variation in
illumination due to sunlight over the course of a day resulted in 8 or 10 categories
useful in soda can identification. All these results indicate that the region between 8
and 16 categories may be where we should investigate.
Now that we have an approximate number of categories to explore, how do we
choose the categories themselves? Computationally, this should have little or no
impact on the final requirements of the system. Ideally, the categories should be
fixed for the expected environment of the robot, so that a lookup table is all that is
We begin with a modified version of uniform quantization. Obviously, uniform
quantization in RGB space produces quite different categories than uniform quan-
tization in a hue-based space. Given that we are concerned exclusively with color
information, a hue-based space seems most promising. We first test quantization to
512 colors in RGB space, which allows us to roughly determine the robustness of the
original algorithm under multiple lighting conditions. Second, we test quantization
to 64 colors in RGB space. We then test quantization of each original image to 14
colors in HLS space, obtaining 3 achromatic colors (white, gray, and black) and 11
chromatic colors (pink, red, orange, yellow, three greens, a dark cyan, light blue, dark
blue, and purple) using the "tree" quantization described in Section 4.1.4. Feature
vectors are generated by scaling the histograms of the indexed images to sum to one.
5.1.2 Theoretical Results
Assuming all colors and combinations of colors are equally likely, our equations
find the expected value of the error. In reality, a given database will produce better
or worse results with this technique, depending on the quantization method used and
the characteristics of the database. Noise in the form of camera calibration, non-
uniformity of color distribution in a given database, and lighting variations will all
affect any results on real data. However, given appropriate conditions (objects easily
Emmmmmm mmmmm mmmmmm MEOOQm
mo00nM MooooeM mOmmOm moooom
*OOOQE EQOQOM HEEMEE EmOQDmE
MBDBoBl MonBoM MnEmmm *NNEDDn
Figure 5.1: Diagram showing examples of each variable used in the theoretical equa-
distinguishable by color alone, or the algorithm used as one method among many for
identifying the objects, and appropriate quantization methods for the characteristics
of the data in question) real results should at least follow the same trends as those
There is a tractable solution to the effects of the variables on expected error
for 3 objects [53, 52]. For more objects, it is possible to derive a specific analytic
solution, but simpler to implement a program to calculate appropriate values. For
larger databases (more than 10 objects) even this approach becomes unwieldy and
time-consuming. The 3-object solution is presented here, along with a description of
the program used to generate the accuracy values for up to 6 objects and simulation
results for larger databases.
For databases containing only three objects (n = 3), we wish to determine the
expected error. In this case, the variable c is used to represent the number of colors in
the original (not quantized) space, k is used to represent the number of colors in the
quantized space, and p is used to represent the number of values of the histogram.
These variables are illustrated in Figure 5.1. In this figure, there are six colors in
the original space. As a result of quantization, red and orange are merged to form a
single new color, represented here as orange. As a result of this, the first two objects
become indistinguishable, while the other two are still separable.
There are c" possible different objects in the input space. We choose three of
these for our database, and then quantize them. The resulting three objects in our
database are members of the k" possible objects in the quantized space. For this
case, we assume that the only degradation occurring over time is modeled by the
quantization. The only time the error increases is when more than one of the objects
in our database quantizes to the same object in the new space. Otherwise, the objects
remain perfectly distinguishable and the error remains at 0. If there is more than one
copy of a given object, we choose randomly between them. The expected error for
a given set of objects can take only three values: 0 (all objects distinguishable), 1/3
(two objects the same) or 2/3 (all three objects identical). However, when we look at
the expected value of this error over all possible sets of objects, in terms of c, k and
p, we get
E[error] [- N2 + N3 (5.1)
Na11 3 3
Here Naii is the number of possible cases, N2 is the number of databases where there
are two objects which are identical and N3 is the number of databases where all three
objects are duplicates. N2 is defined as
N2 = S] (j -i) (C -S) + St (St + S- 2) (5.2)
j 1 t4j
and N3 is defined as
N3 (sj )(sj 2) (5.3)
where si is the number of objects created in the original c-color space that are quan-
tized to the same ith object in k space. Naii is the number of possible n-object
10 2 p=3 ,
Figure 5.2: Theoretical results for c 224, p 2 and p 3, n 3, and k varying.
databases in the c-color space.
Na11 (P! (5.4)
Figure 5.2 shows the results for this equation when p 2 (crosses) and p 3
(stars) for k varying. The effect of keeping p of the bins is clearly distinguishable.
The y-axis is the error di l on a log scale. Clearly, a larger p corresponds to a
measurable reduction in error. Increasing p by one produces a reduction in error of
more than an order of magnitude for k > 10. k 10 is also the threshold for error
below 1 for p 2.
5.1.3 Larger databases
For more than 3 objects in the database, the computational complexity of the
theoretical solution increases greatly. However, for small n (up to 6 on our equipment
and with our time limitations), it is possible to compute the expected accuracy,
with one assumption. We assume that (cP)/(k ) is larger than n. This ensures that
Theoretical Results for Varying k
10 n=4 dots
k =1 to25
Figure 5.3: Theoretical results for c = 224, p = 3, and k varying along the x axis.
Each line corresponds to a different value of n.
all combinations of n of the k" possible objects are possible. This is not a severe
limitation, as in general we quantize from 224 colors down to tens of colors.
In Figure 5.3, we can see the effect of changing the number of bins in the new
space on the error. As k increases, the error decreases. If we plot the log of k versus
the log of the error, we get the lines shown. Each line corresponds to the results for
a given value of n. As n increases, the error also increases, but the relative increase
in error is small.
Figure 5.4 shows the behaviour of log error versus log k as p is increased. Again,
the results are shown on a log-log plot. As p increases, not only does the error
decrease, but the lines also become steeper. Thus, increasing p has a greater effect on
error when k is larger. Experimentally, the error when n = 4, p = 3 and k is on the
order of 1000 is of the same order of magnitude as the error when n = 4, p = 6 and
k = 25. Doubling the number of features kept reduced the number of colors needed
by a factor of 40 without changing the error.
For larger databases, we implemented a Monte Carlo simulation. These results
were obtained by averaging the error from 10, 000 randomly generated databases of n
Theoretical Results for Varying k
10 p=4 dots
k =1 to25
Figure 5.4: Theoretical results for c = 224, n 4, and k varying along the x axis.
Each line corresponds to a different value of p.
objects. This average error is plotted in Figure 5.5. Predictably, error increases with
the number of objects in the database. As the histogram space becomes less sparse,
the system's robustness to fluctuations in hue deteriorates.
Theoretical results show coarse quantization can dramatically reduce necessary
storage and processing while producing a minimal reduction in accuracy. Increasing
the number of features kept dramatically increases the accuracy, while increasing the
number of objects reduces the accuracy. Large decreases in the number of colors
available can be compensated for by small increases in the number of features kept.
The results presented to this point are for the best possible case: only degradation
resulting from quantization is interfering with perfect accuracy. In the real world,
other factors affect accuracy. These include lighting variation illuminantt noise) and
differences in camera placement and calibration. In general, as long as the illuminant
noise is not too great (actual magnitude will depend on the database and the degree of
quantization), some amount of quantization will help eliminate this noise and improve
Simulation Results for Varying n, k=10
10 p=3 p=
Figure 5.5: Simulation (averaged) results for c = 224, p = 2 and p = 3, k = 10, and n
5.2 Synthetic Data
There are two interlinked factors still to be tested. First, the theoretical results
assume both uniform distribution of colors/objects and uniform quantization. There
is no guarantee that the database will be in any way similar to the quantization
procedure followed. Thus a quantization scheme that is in some sense data-dependent
may be required. Second, and more importantly, these results show what happens
when there is no noise in the form of lighting shifts.
In order to simulate the effect of lighting shifts and test various quantization
schemes, further research was performed using synthetic data. The first experiment
tested the case when the objects all fall into a single region of the hue axis. Sec-
tion 5.2.1 describes the results of this experiment. Section 5.2.2 covers the results
when the objects are equally distributed in hue. The results when each object is con-
centrated in more than one hue region are covered in Section 5.2.3, and Section 5.2.4
show the results when these multiply-hued objects are shifted in hue not with a sim-
ple linear shift, as in the previous sections, but with the lighting shifts measured in
Section 4.2.2 of C'!i pter 4.
Single Region Hue-Localized Database
50 100 150 200 250
Histogram Bin / Hue
Figure 5.6: Image of single hue region synthetic database. Red corresponds to higher
values; blue to lower values.
5.2.1 Single Hue Region
A synthetic database of 16 objects was generated. Each object consisted of a
sinusoidal bump, 8 of 256 bins wide. Figure 5.6 shows an image of this database.
Larger values are red; 0 is blue. The entire database takes up only half of the possible
hue axis. The joint purposes of this experiment were to verify that the accuracy
sweep method (described in Section 4.2.1 of C'i Ilpter 4) of optimizing bin selection was
working properly and to compare the results of the uniform and the accuracy sweep
methods of quantization. The accuracy sweep method should outperform uniform
quantization when the database is localized within one or more regions of the hue axis,
and should perform as well as the uniform method when the database consists of colors
evenly spread throughout the histogram space. When small lighting shifts occur, the
accuracy sweep method should dramatically outperform the uniform quantization
Shift of 0
0 50 100
Shift of 3
S Acc. Sweep
0 50 100
Shift of 6
S0.5 Acc. Sweep
Number of Bins
Shift of 1 Shift of 2
S Acc. Sweep
S Acc. Sweep
0 50 100 0 50 100
Shift of 4 Shift of 5
0.5 0 .5 0. Uniform
0.5 0.5eAc c. weep
S-- Acc. Sweep
0 50 100 0 50 100
Shift of 7 Average accuracy
0 Acc. Sweep
0 50 100 0 50 100
Number of Bins
Number of Bins
Figure 5.7: Results on single hue region database for varying numbers of bins, using
uniform (blue) and accuracy sweep (red) quantization. Average accuracy across the
different shifts is shown in the lower right hand plot. Standard deviation of average
value is shown with dotted lines. Progressively larger shifts from none (upper left
hand plot) to 7 (middle lower plot) are shown in the remaining plots.
method on training data except when uniform quantization results in precisely the
correct bin locations.
The number of bins was swept from 2 to 256 for uniform quantization and from 2
to 128 for accuracy sweep quantization. The accuracy results from 2 to 128 bins are
shown in Figure 5.7. Only 7 shifts were performed, as the 8th shift would cause all but
one of the objects to line up perfectly with a different object, producing 101' error
on those 7 objects. The single object that didn't line up with another object would
line up with nothing, producing error of random chance (1 in 16 chance of identifying
the object correctly, given that object). As the shifts increase, the expected value of
Single Region Database
50 100 150 200 250
50 100 150 200 250
Histogram Bin / Hue
Figure 5.8: Database of objects completely spanning the hue axis.
the accuracy would increase until a shift of 128, and then decrease until we were 8
bins away on the other side.
Clearly the accuracy sweep method produces superior results on this type of data
set. It outperforms the uniform quantization by a larger and larger margin as the
values are shifted away from the original. Even for the largest shift, where only
one pixel of the shifted version overlaps with the original object, the accuracy sweep
method correctly identifies over half the objects, while uniform quantization fails
dramatically. Looking at the average accuracy, it is clear that overall the accuracy
sweep method performs better than the uniform method on training data.
The characteristic shape of the uniform quantization method for large shifts (com-
pared to the size of the objects, large is defined as shifts greater than half the size of
the object bumps) is of more interest than the average accuracy results. For small
shifts accuracy degrades as the number of bins is reduced. For a medium shift (exactly
Shift of 0
0 50 100
Shift of 3
S Acc. Sweep
0 50 100
Shift of 6
Number of Bins
Shift of 1
S Acc. Sweep
0 50 100
Shift of 4
Number of Bins
Shift of 2
0 50 100
Shift of 5
S Acc. Sweep
Number of Bins
Figure 5.9: Results of uniform (blue) and accuracy sweep (red) methods.
half the object) the uniform accuracy is extremely unpredictable from one number of
bins to another. However, for large shifts, the uniform accuracy follows a predictable
trajectory. First the accuracy is very low (2 bins). As the number of bins is increased,
the accuracy increases (compatible with the theoretical results). Eventually a peak
is reached, and the accuracy begins to decrease. This peak is the point at which the
maximum number of pixels are being consolidated into the correct bin as a result of
the widened bin sizes. As the bin sizes decrease and the number of bins increase,
the accuracy decreases, until at 256 bins the accuracy is 0. No objects are identified
correctly, because every object overlaps more with a different object than with itself.
5.2.2 All Hues
Instead of limiting the objects to a single region, the new database has identical
objects spread along the entire hue axis. Instead of 16 objects, the new database
Two-Region Hue-Localized Database
50 100 150 200 250
Histogram Bin / Hue
Figure 5.10: Original database for histograms with two colors.
(shown in Figure 5.8) has 32 objects. Each of these 32 objects are the same as any
one of the original 16 objects, merely shifted to a new region of the axis.
Figure 5.9 compares uniform and accuracy sweep methods. Clearly, when the col-
ors of the objects within the database are uniformly spread, and there is no noise in
the form of offsets, the uniform method does as well as the more optimized method.
However, when offsets are introduced, again the uniform accuracy degrades substan-
tially while the accuracy sweep method continues to perform well on training data.
5.2.3 Multiple Hues
Now that we know what happens when the database consists of identical objects
with single color regions, we wish to determine how the accuracy is affected by hue
shifts when each object consists of multiple hues. We begin with objects consisting of
two colored regions of equal size. This corresponds to histograms with two identical
Two-Region Hue-Localized Database: Average accuracy as a function of shift
0 10 20 30 40 50 60 70
Figure 5.11: Average accuracy of uniform (blue) and accuracy sweep (red) methods
as a function of shift.
bumps. Figure 5.10 shows the new database. Again, red represents larger values and
blue, smaller values. The offset is 64 elements, so a sweep of 72 bins will show the
full range of possible accuracies. Each bump of each object is still 8 hue values wide,
generated with half a period of a sine wave.
Figure 5.11 shows the accuracy as a function of shift. The dotted lines indicate
standard deviation. The curve is bimodal, with the first peak corresponding to a
correct match of the first histogram peak with the first histogram peak for a given
object, and the second accuracy peak corresponding to a match of the first histogram
peak of an object with its shifted second peak. Because the second peak of this plot
corresponds to only one peak contributing to the accuracy, rather than two, its peak
accuracy is lower.
Shift of 1 Shift of 5 Shift of 10
1 1 1
I0.5 0.5 0.5
Ac. Sweep Acc. Sweep
0 0 0
0 20 40 0 20 40 0 20 40
Shift of 25 Shift of 30 Shift of 55
1 1 1
S Uniform Uniform Uniform
S Acc. Sweep Acc. Sweep Acc. Sweep
S0.5 0.5 0.5
0\ 0 0
0 20 40 0 20 40 0 20 40
Shift of 65 Shift of 70 Average accuracy
S Uniform Uniform 1 Uniform
S Acc. Sweep Acc. Sweep Acc. Sweep
S0.5 0.5 0.5 0.5
0 A 0
0 20 40 0 20 40 0 20 40
Number of Bins Number of Bins Number of Bins
Figure 5.12: Results of uniform (blue) and accuracy sweep (red) methods.
Figure 5.12 shows the results for selected shifts. Clearly, the results are best when
there is no shift. The same pattern as before is evident. The accuracy sweep results
(red line) show the same smooth rise and larger accuracy than the uniform results.
The uniform results (blue line) show the same abrupt peak followed by a decrease
to zero for larger shifts. The plots in this figure were chosen to show the effects of
the bimodal curve in Figure 5.11. Note that small shifts (1 to 10) are similar to the
results for the previous database, with roughly 50'. accuracy at shifts of 4 and 5. For
the region between the two peaks in the bimodal curve (10 to 55) the uniform results
show a negligible bump for few bins while the accuracy sweep results are non-zero
and varying. Shifts of between 55 and 72 the curves follow the results for the first
peak, but attain a lower maxima.
Our second synthetic database with two peaks is shown in Figure 5.13. This
database covers the entire hue axis and has an offset between the two objects of 128
50 100 150 200 250
Histogram Bin / Hue
Second two-peak database.
(half the potential hue space). This is a more uniform database, so we anticipate that
the accuracy sweep method should not increase the accuracy substantially. Because of
the symmetry in the database, we need only sweep the shifts from 0 to 8. Figure 5.14
shows the average results as a function of shift. There is a clear decrease in accuracy
as the shift is increased. Continuing to a shift of 16, it is reasonably clear that the
accuracy levels off at a shift of 8. We would expect that the accuracy of shifts between
16 and 120 would be approximately the same as the accuracy for shifts between 8
and 16, with shifts between 120 and 128 mirroring the shifts from 1 to 8.
Results for av iii, I, of shifts are shown in Figure 5.15. Again, we have roughly
50' accuracy when the shift is 4, or half the bump size. We also continue to see the
clear pattern of increased accuracy for few bins when the shift is greater than half but
less than the full width of the bump. When the shift is larger than the bump size,
accuracy is decreased to almost zero. The accuracy sweep results for the flat section
Two-Region Database: Average accuracy as a function of shift
2 4 6 8
10 12 14 16
Figure 5.14: Average accuracy for second two-peak database as a function of shift.
of the shift curve show a similar small bump for few bins as the uniform results in
the previous database.
5.2.4 More Complex Lighting Shifts
In this section, the databases from the previous subsections are tested with a
more complex lighting shift. The shift is generated from the images of cans under
different laboratory lighting conditions described in Section 4.2.2. Figure 5.16 shows
sample transforms. The black line of points corresponds to the transformation from
the average lighting condition (very close to the mniddli responses) to the earliest
morning lighting condition (9 am d4i-light with fluorescent lights). The blue line
corresponds to the transformation from the average to the latest evening condition
(fluorescent lights, no d4i-light). In each case, the values from the cans in the image,
rather than the calibrator, were used to generate the transformations. If we want to
Shift of 1
S Acc. Sweep
0 20 40
Shift of 4
0 20 40
Shift of 11
S Acc. Sweep
Number of Bins
Shift of 2
0 20 40
Shift of 5
0 20 40
Shift of 15
Number of Bins
Shift of 3
0 20 40
Shift of 6
0 20 40
0 20 40
Number of Bins
Results of uniform and accuracy sweep methods on second two-peak
perform color constancy with this data, we should make sure that a given location
on the x-axis fits into a bin that contains the entire difference between the blue and
black curves. Using the width between the curves between 75 and 125, roughly 16
categories would be necessary.
Figure 5.17 shows the original input in the upper plot and the warped input
generated assuming the original data were taken under the average illuminant and
the warped version was taken under the first lighting condition in the lower plot.
The original database corresponds to the single-peak database that spans the entire
hue space, as described in Section 5.2.2. However, the warped version is substantially
different from the simple shifts tested previously. Instead of a linear shift, the warping
stretches some regions and compresses others.
Lighting Shift Comparison
Mean to 12
Mean to 1
0 50 100 150 200
Figure 5.16: Tranformation from average illuminant
to early morning and evening
This non-uniform warping is apparent when we compare the results of the linear
shift with the results using the warped data. Figure 5.18 compares the linear shift of
one to the closest warped data (middiilv) to the average, and the most distant warped
data (early morning). The results in Figure 5.9 are comparable to the results with a
shift of 6. The accuracy sweep method reaches a peak of roughly the same accuracy,
but in the warped case, there is a substantial drop in accuracy between the 45th and
100th bins. In addition, the uniform results never drop to zero in the warped data
case in the way they do with a simple linear shift. However, at least for the accuracy
sweep method, there is a clear initial bump in the accuracy, followed by a decrease
as the number of bins increases.
Warped Database (Mean to LC1)
Figure 5.17: Comparison of original and warped databases.
The accuracy sweep method does a wonderful job of enabling the algorithm to
adapt to given lighting changes. If the lighting is only going to change in one way, the
accuracy sweep method will design the bins to optimize for that shift. However, if the
examples that the method is using to train do not reflect the actual lighting conditions
to be encountered, the bin locations are less optimal than uniform. Figure 5.19 shows
the results when the bins generated for a single lighting shift are used to determine
the accuracy for the other lighting shifts.
The dotted lines are the accuracies generated in the test case, with the optimal
accuracy sweep for the training data in red and the uniform results in blue. The solid
red lines show the accuracy for the given shift used as cross-validation data when the
training data is the same training data used to generate the bin locations.
0.8 Uniform training
Uniform 13 vs 1 Warp
-Acc Sweep training
0.7 Acc Sweep 13 vs 1 Warp
0 20 40 60 80 100 120
Number of Bins
Figure 5.18: Results for training (original) and testing (warped).
The theoretical results showed that when there is no noise in the system, we
can expect at most a minor drop in accuracy when we quantize from many bins to
on the order of ten. For large databases, more than ten colors may be necessary,
depending on the size and composition of the database. These results indicate that
systems using quantization as a technique to improve efficiency are likely to work very
well. However, the theoretical results did not predict whether or not the algorithm is
capable of performing limited color constancy.
The synthetic results showed that in the training case, we can expect the accuracy
sweep method of determining the bin locations to outperform the uniform quantiza-
tion method. In addition, when the uniform quantization method is used with values
that have been shifted by more than half the width of the histogram peaks, accuracy is
Shift of 1
0 50 100
Shift of 3
Shift of 6
Shift of 4
Shift of 7
100 0 50 100
Shift of 5
100 0 50 100
Acc Sw Training
Acc Sw Testing
100 0 50 100
Figure 5.19: Over-fitting of accuracy sweep method.
substantially higher for few categories than for many. Thus, for small linear hue shifts,
quantization can perform limited color constancy. This is shown by the increase in
the accuracy when only a few bins are used. The realistic lighting shift, however, did
not show perceptible color constancy. On training data, the accuracy sweep method
did seem to perform some form of color constany, as the accuracy increased for fewer
bins. However, the uniform method did not show any significant change in accuracy
as a function of the number of bins, and the accuracy sweep method overfitted the
data and was thus unfit for use in environments where lighting is unpredictable.
Shift of 0
Shift of 2
STILL IMAGE RESULTS
To test our results from C'!i ipter 5, we generated a series of databases. The small-
est database consisted of images taken under typical laboratory lighting conditions.
The database contained images of 9 different soda cans, under 4 different illuminants.
Each image was segmented to contain only the soda can in question, and no pre-
processing was done before the quantization stage. The four illuminants consisted
of ambient di-light, ambient fluorescent light, frontal d4i-light with left fluorescent
light, and frontal d4i-light with right fluorescent light. Each can was in the same
location, in the same orientation, with the same surroundings, so differences due to
reflections, orientation shifts, and shadows were minimized.
A second database of 14 cans under 8 different illuminants was used to verify the
theoretical results. This database was intended to be much more difficult, as the
illuminants were common household illuminants, not pre-processed to compensate
for intensity or hue variations. As a results, the images in this database varied far
more than the others in terms of overall brightness and hue. A third database of 86
cans under 4 common laboratory illuminants and 8 different orientations was used to
test the effects of quantization on larger databases. Of this larger database, a subset
of 15 cans were used to test the results on a database of objects whose primary
colors all come from a single region of the hue axis. Databases were quantized to
2, 3, 4, 5, 6, 8, and 14 colors using the "tree" method described in Section 4.1.4.
The number of bins was also swept from 2 to 256 using the uniform and accuracy
sweep quantization methods along the hue axis, with only the hue information used to
identify the objects, as in the results on synthetic data in ('!i lpter 5. In addition, the
lighting shift method proposed at the end of C'!i lpter 4 was used. All these results on
Diet Cola 2
Diet Cola 5 Diet Cola 6
Diet Cola 3
Diet Cola 7
Diet Cola 4
Diet Cola 8
Figure 6.1: Sample soda images used in 14-can database. Soda shown here is Publix
brand Diet Cola, under each of eight different illuminants.
the unprocessed database are compared to the results when the large database is pre-
processed with the multi-scale retinex with color restoration described in ('!i Ilter 2,
and to the results using simple normalization to an [r, g] chromaticity space.
6.1 Theoretical Comparison
Predictions were tested against a real database containing images of 14 soda cans.
Each can was oriented the same way in each image, to minimize the effects of ori-
entation changes. In Figure 6.1 the eight different lighting conditions are shown for
one can. The can shown is white, with markings in dark red and gray. Three images
were taken of each can under incandescent light: one with no other ambient light, one
with ambient (but not direct) sunlight, and one with full sun. One image was taken
of each can under ambient but not direct sun. Two images were taken of each can in
Figure 6.2: Colormaps for 2, 3, 4, 5, 6, 8 and 14 colors.
bright and dim halogen light, with and without sunlight, for a total of four images of
each can. Each of these 112 images were quantized to 2, 3, 4, 5, 6, 8, and 14 colors
using the data-independent tree method described in Section 4.1.4.
The colormaps generated for our quantization to 2, 3, 4, 5, 6, 8 and 14 colors
are shown in Figure 6.2. As the number of available colors increased, the number of
chromatic colors also increased. Because the object recognition algorithm is based
on color, it seemed appropriate to include as ri ,r' hues as possible and minimize
achromatic discrimination. Colors whose saturation was below 0.2 were considered
achromatic, and colors whose lightness was below 0.1 were assigned to the darkest
achromatic color. The upper lightness threshold was set to 1, so even very pale colors
were considered chromatic.
In the theoretical equation, each area of uniform color is assigned an index, and so
it is possible to derive accuracy values for k < p. For real data, an area was designated
Results for Varying k
0 5 10 15 20 25
Figure 6.3: Theoretical predictions (blue solid lines) for c 224, p =2 and p = 3,
n 3, and k varying. Real database results (red dashed lines) for c = 224, p 3 and
p 5, n = 3 averaged over 20 sets, and k varying
as the sum of all pixels of the same quantized color, so k could not be smaller than p.
For comparison with the theoretical predictions for varying k, we averaged the results
from 20 sets of three images, where each image in a given set was of a different can.
For comparison to the simulation data, we determined the accuracy for sets of 10, 20,
30, 50 and 100 images, without averaging over different combinations. Our database
of real images utilizes resubstitution, to better compare accuracies derived from real
data to the theoretical expected accuracy.
Figure 6.3 compares the theoretical and real data, for several values for p, as k
varies from 1 to 25 and c is held to 224. The theoretical results (blue solid lines) predict
high accuracy even for values of k as low as 5 (> .'. for p = 2 and > 99.5'. for p = 3).
The results from the real image database (red dashed lines) are somewhat lower with
accuracy below 911' when k is 6 or less. However, when k equals 8 or 14 the accuracy