CISE TECHNICAL REPORT: REP2009467
REP2009467: NonLambertian Reflectance
Modeling and Shape Recovery for Faces using
AntiSymmetric Tensor Splines
Ritwik Kumar, Student Member, IEEE, Angelos Barmpoutis, Student Member, IEEE,
Arunava Banerjee, Member, IEEE, and Baba C. Vemuri, Fellow, IEEE
AbstractModeling illumination effects and pose variations of a face is of fundamental importance in the field of facial image analysis.
Most of the conventional techniques that simultaneously address both of these problems work with the Lambertian assumption and
thus, fall short of accurately capturing the complex intensity variation that the facial images exhibit or recovering their 3D shape in
presence of specularities and cast shadows. In this paper we present a novel antisymmetric tensor spline based framework for
facial image analysis. We show that using this framework, facial apparent BRDF field can be accurately estimated while seamlessly
accounting for cast shadows and specularities. Further, using local neighborhood information, the same framework can be exploited to
recover the 3D shape of the face (to handle pose variation). We quantitatively validate the accuracy of the antisymmetric tensor spline
model using a more general continuous mixture of single lobed spherical functions. We demonstrate the effectiveness of our technique
by presenting extensive experimental results for face relighting, 3D shape recovery and face recognition using the Extended Yale B and
CMU PIE benchmark datasets.
Index TermsAntiSymmetric Tensor Splines, NonLambertian Reflectance, 3D Shape Recovery, Facial Image Analysis .
 
1 INTRODUCTION
PRECISELY capturing appearance and shape of objects
has engaged human imagination ever since the con
ception of drawing and sculpting. With the invention
of computers, a part of this interest was translated into
the search for automated ways of accurate modeling and
realistic rendering of appearances and shapes. Among
all the objects explored via this medium, human faces
have stood out for their obvious importance. In recent
times, the immense interest in facial image analysis has
been fueled by applications like face recognition (on
account of recent world events), pose synthesis and face
relighting (driven in part by the entertainment industry)
among others. This in turn has led to an epitome of liter
ature on this subject, encompassing various techniques
for modeling and rendering appearances and shapes of
faces.
Our understanding of the process of image formation
and the interaction of light and facial surface has come
a long way since we started [34], with many impressive
strides along the way (e.g. [12], [25], [13]), but we are
still some distance from an ideal solution. In our view, an
ideal solution to the problem of modeling and rendering
appearances and shapes of human faces should be able
to generate extremely photorealistic renderings of a
person's face, given just one 2D image of the face, in
any desired illumination condition and pose, at a click
of a button (real time). Furthermore, such a system
This manuscript is under review at IEEE Transactions on Pattern Analysis
and Machine ' r..' .
All authors are with the Department of Computer and Information Science
and Engineering at the University of Florida, Gainesville, FL 32611, USA
should not required any manual intervention and should
not be fazed by presence of common photoeffects like
shadows and specularities in the input. Lastly, such an
ideal system should not require expensive data collec
tion tools and processes, e.g. 3D scanners, and should
not assume availability of metainformation about the
imaging environment (e.g. lighting directions, lighting
wavelength etc.).
These general requirements have been singled out
because the existing stateoftheart is largely comprised
of systems which relaxes one or more of these conditions
while satisfying others. Common simplifying assump
tions include applicability of Lambertian reflectance
model (e.g. [12]), availability of 3D face model (e.g. [9]),
manual initialization (e.g. [7]), absence of cast shadows
in input images(e.g. [10]), availability of large amount
of data obtained from custom built rigs (e.g. [25]) etc.
These assumptions are noted as "simplifying" because 
human faces are known to be neither exactly Lambertian
nor convex (and thus can have cast shadows), fitting a
3D model requires time consuming largescale optimiza
tion with manual selection of features for initialization,
specialized data acquisition can be costly and in most
real applications only a few images of a face are avail
able.
The method we propose in this paper moves the
stateoftheart closer to the ideal solution by satisfying
more of the above mentioned attributes simultaneously.
Our technique can produce photorealistic renderings
of human faces across arbitrary illumination and pose
using as few as 9 images (fixed pose, known illumi
nation direction) with spatially varying nonLambertian
CISE TECHNICAL REPORT: REP2009467
reflectance model. Unlike most techniques, our method
does not require input images to be free of cast shadows
or specularities and can reproduced these in the novel
renderings. It does not require any manual initialization
and is a purely image based technique (no expensive 3D
scans are needed). Furthermore, it is capable of working
with images obtained from standard benchmark datasets
and does not require specialized data acquisition.
Our technique 1 is based on a novel framework of anti
symmetric tensor splines which can be used to approx
imate any ndimensional field of spherical functions. In
the case of faces, we use antisymmetric tensor splines
to approximate the field of Apparent Bidirectional Re
flectance Distribution function (ABRDF) for a fixed view
ing direction. Unlike the BRDF, the ABRDF at each
pixel captures the variation in intensity as a function of
illumination and viewing direction and thus is sensitive
to the context of the pixel. Once the ABRDF field has
been captured, images of the face under the same pose
but with arbitrary illumination can be generated by sim
ply taking weighted combinations of the ABRDF field
samples. Next, we estimate the surface normal at each
pixel by computing the weighted Karcher mean of the
rotated normals of the neighborhood pixels. To recover
the requisite rotation we present an extremely efficient
linear method along with the more straightforward non
linear optimization based method. Once the rotations for
the ABRDF field have been determined, we initialize the
normals with the maxima of the ABRDF functions and
iteratively compute the surface normals. With as few as
1 or 2 iteration, we can recover the surface normal fields
of most faces which are then numerically integrated to
obtain the face surfaces. Novel pose with novel illumi
nation conditions can then be rendered while seamlessly
accounting for attached as well as cast shadows.
The rest of the paper is organized as follows: In Section
2 we present a detailed survey of related work. In Section
3, we give an overview of our system and in Section 4,
the novel AntiSymmetric Tensor Spline framework for
ABRDF modeling is introduced. In Section 5, we present
a validation model for the AntiSymmetric Tensor Spline
method. In Section 6, the novel shape recovery proce
dure is presented and in Section 7, we present detailed
experimental results and related discussion. Finally we
conclude in Section 8.
2 RELATED WORK
The sheer size of the facial shapereflectance modeling
literature allows its taxonomy to be carried along various
possible lines. Here we have classified methods based on
various assumptions made by them while modeling fa
cial reflectance and shape. We have also summarized few
of the key methods along with associated assumption in
Table 1.
1. A part of the work presented here on relighting appeared in the
Proceedings of IEEE CVPR 2008 [29].
A large fraction of the existing techniques for facial
image analysis work with the Lambertian assumption
for reflectance. This translates to assuming that the BRDF
at each point on the object's surface has the same shape,
that of half cosine function, which has been scaled by
a constant albedo, and is oriented about the surface
normal at that location. One of the major reason for the
prevalence of this model is its simplicity. Analysis has
shown that under this assumption, if cast and attached
shadows are ignored, image of a convex object, in a fixed
pose, lit by arbitrary illumination lies in a 3dimensional
subspace [35]. When ambient lighting component is in
cluded, this subspace expands to become 4dimensional
[36] and when attached shadows are taken into account,
the subspace grows to become infinite dimensional 
illumination cone [37].
Spherical harmonic analysis of the Lambertian kernel
has shown that even though the illumination cone is
infinite dimensional, it can be approximated quiet well
by a lower dimensional subspaces ([39], [13], [12]). In
particular, these methods can produce impressive results
with 9 basis images, though they require 3D shape and
albedo field as input. These basis images can also be
directly acquired using the "universal virtual" light
ing conditions [43]. More recently, this idea has been
extended to 3D surfaces in [7] building on the prior
seminal work presented in [9] called Morphable Models.
Morphable Models can recover 3D shape of a face by
fitting an average 3D facial model to a given 2D image,
accounting for necessary shape and texture adjustments.
Morphable Models are known to produce excellent re
sults for across pose face recognition but cannot handle
cast shadows or specularities robustly. More importantly,
they require manual delineation of facial feature to ini
tialize a complicated nonlinear optimization which can
take long time to converge and can suffer from local
minima. Using the idea of low dimensional subspace
explored above, [33] represented the entire lightfield
using a low dimensional eigen lightfield.
It has been suggested that even though the time and
cost of acquiring the 3D data is decreasing, majority
of the face databases still remain 2D and hence it is
more pragmatic to work with 2D images alone [24].
Methods that are purely image based and work with
the Lambertian assumption generally apply photometric
stereo or shape from shading to recover facial shape from
the given images. For instance results for simultaneous
shape recovery using photometric stereo and reflectance
modeling were presented in [21] and [10]. Both of these
methods work with multiple images and expect no cast
shadows and very little attached shadows in the images.
Here the cast shadows in the relit images are rendered
using ray tracing, which can be computationally ex
pensive. Examples of methods that recover shape from
shading working under the Lambertian assumption can
be found in [28] and [41]. As these methods work with
just a single image, besides requiring absence of cast
shadows in input, they make additional assumptions
CISE TECHNICAL REPORT: REP2009467
TABLE 1
Requirements, Assumptions and Capabilities of Existing Methods
Methods Assumed No. of Relit Shape Cast Purely Other Assumptions, Requirements
Surface Images Images or Pose Shadow Image and Limitations
BRDF as Pre Results Allowed based
Model Input sented Pre in Input (No 3D
sented Scans)
1999 MVIEW Lambertian > 3 X Near frontal illumination expected,
Georgihades et al.[21] Ray tracing for cast shadows.
1999 SIGGRAPH Phong 1 X X No attached shadows, Manual initial
Blanz et al.[9] ization to fit 3D model.
2000 SIGGRAPH Non > 2000 Custom rig for data collection, Struc
Debevec et al.[25] Lambertian tured lighting for shape.
2001 SIGGRAPH Lambertian > 3 X X X Distant and isotropic lighting, 3D
Ramamoorthi et al.[13] Scans needed as input.
2001 SIGGRAPH Non > 50 X X Custom rig for data acquisition, No
Malzbender et al.[38] Lambertian specularity allowed in input.
2001 PAMI Lambertian >7 X Almost no attached shadow, Symmet
Georgihades et al.[10] ric faces, Ray tracing.
2001 PAMI Lambertian 1 X X Bootstrap set of images required,
Shashua et al.[10] Ideal class assumption.
2001 IJCV Lambertian 1 X No attached shadows, Symmetric
Zhao et al.[28] faces, piecewise constant albedo.
2001 ICCV Non > 300 X Known lighting directions, Lighting
Magda et al.[26] Lambertian should doubly cover the directions
2003 EGSR Non >12 X 3 sources/pixel, adhoc shadow de
Georghiades et al.[22] Lambertian tection, Spatially constant BRDF.
2003 CVPR Diffuse 1 X X Symmetric lighting, 3D Model Fit
Wen et al.[4] ting, Manual initialization.
2003 PAMI Lambertian 1 X X X Distant & isotropic lighting, 3D scans
Basri et al. [12] required, Manual initialization.
2004 PAMI Lambertian > 1 X X Manual delineation of feature points
Gross et al. [33] for better recognition.
2005 ICCV Non 12 X Known lighting, HDR images ex
Goldman et al. [23] Lambertian pected, Manual threshold selection
2005 ICCV Non 1 X Custom data acquisition, 3D Model
Lee et al. [24] Lambertian fitting with manual initialization.
2005 PAMI Non >8 X X No shadows expected, Reference ob
Hertzmann et al. [20] Lambertian ject expected, Symmetry of faces.
2005 IAMF Lambertian 1 X X Shadowed pixel gets default albedo,
Lee et al. [31] Universal 3D face model required.
2006 PAMI Lambertian 1 X X 3D Model Fitting with manual initial
Zhang et al. [7] ization.
2006 PAMI Non >1 X X X Point lighting sources with known
Zickler et al. [14] Lambertian directions, Object shape required.
2007 CVPR Lambertian > 4 X 3 sources/pixel, Known lighting,
Chandraker et al. [5] Normals can't be on bisection planes.
2007 ICCV Non > 32 X Point light sources, Known direc
Alldrin et al. [16] Lambertian tions, BRDF isotropic about normal.
2007 ICCV Lambertian 1 X X Point sources with known directions,
Biswas et al. [30] Registered avg. 3D model required.
2007 PAMI Lambertian 1 X No attached shadows expected, Sym
Zhou et al. [6] metry of faces, Bootstrap set required.
2007 IJCV Lambertian 15 X X Distant and isotropic lighting, Works
Basri et al. [11] for only convex objects.
2008 CVPR Non > 102 X Point sources with known directions,
Alldrin et al. [17] Lambertian BRDF isotropic about normal.
Our Method Non > 9 Point sources with known directions.
Lambertian
like facial symmetry in [28]. An important point to note
here is that uncalibrated photometric stereo or shape
from shading methods that work with the Lambertian
assumption and orthographically projected images also
suffer with the BasRelief Ambiguity ([40]). Resolving
this requires additional assumptions like symmetry of
face, nose and forehead being at the same height, known
lighting directions etc. or manual assistance.
Recently, shape recovery using generalized photomet
ric stereo was presented in [11] which relaxes some of
the assumptions made by traditional photometric stereo.
This method can recover shape from images taken under
general unknown lighting. On account of the Lamber
tian assumption, cast shadows are not entertained in
the input images and shape of the object is assumed
to be convex. Note that accurate recovery of shape
using this method requires 15 to 60 images as input.
Another method for Lambertian shape recovery with
multiple illuminants, but without ignoring shadows, was
presented in [5] where graph cuts method was used
to identify light source visibility and information from
shadow maps were used to recover the shape.
At a contrast to most of methods mentioned above are
the techniques that seek illumination invariant represen
tations of faces which can be then used to render relit
images. Seminal work in this category was presented
CISE TECHNICAL REPORT: REP2009467
in [32], where so called "Quotient Images", generated
using ratio of albedo values, were used to generate
images under novel illumination. More recently, use of
invariants was invoked in [4], where radiance environ
ment map was deduced using the ratio image technique
([42], [32]). Note that the shape recovery in [4], like
the Morphable Models, requires manual initialization.
Forgoing the ratio technique, direct use of albedo as a
illumination invariant signature of face images was ex
plored in [31], where using an universal 3D face model,
illumination normalized images of faces were generated.
This method worked with low resolution images and
did not render high quality relit images. More recently
an improvement was presented in [30] where the albedo
estimation was made more robust using error statistics
of surface normals and known illumination direction.
This method requires a registered average 3D model
of face and does not allow cast shadows in input but
as compared to [31], it provides better shape recovery.
Improving upon the idea of ideal class assumption ([32]),
a generalized photometric stereo was presented in [6].
Using a bootstrap set of facial images and exploiting the
subspace spanned by a set of basis object with Lamber
tian surface, images with novel pose and illumination
were generated. Faces were assumed to symmetric and
input was assumed to be free of shadows.
Next, we look at techniques that do not make the
Lambertian assumption. Seminal work in this class of
techniques was presented in [25] where using a custom
built rig, dense sampling of the illumination space for
faces was obtained. In this work the facial shape was
obtained using structured lighting and no assumption
about the surface BRDF was made. This completely data
driven technique was able to produce extremely photo
realistic images of the face in novel illumination and
pose. Specular component was captured using polarized
lighting and modified appropriately for pose variation.
This method demonstrated that if a large number of
images (> 2000) for each subject can be obtained un
der various lighting configurations, relighting and pose
generation problem can be solved, but the cost of such
a system can be extremely high.
Use of biquadratic polynomials to model texture was
explored in [38]. This method required custom built rig
and more than 50 specularity free images to recover the
model parameters. Shape of the object was not recovered
in this method. Use of a large number (> 300) of images
to recover the shape without making any assumption
about the BRDF nature was revisited in [26]. This method
required input images to doubly cover the illumination
direction which called for specialized data acquisition.
No attempt to capture the reflectance properties of the
object was made in this work.
One of the first techniques that worked with standard
face databases and did not required custom data was
presented in [22] where the more general Torrance
Sparrow ([51]) model for BRDF was used. This method
presented relighting and pose variation results with 12
images as input but did not allow cast shadows. Further,
this method required each pixel to be lit by at least 3 light
source in order to work properly.
Important contribution in the field of example based
shape recovery was made by [19] where objects of inter
est were imaged along with a reference object of known
geometry (e.g. sphere). Multiple (> 8) cast shadows free
images were used as input. This work was expanded to
allow spatially varying BRDF by using multiple refer
ences in [20]. An interesting extension of this work was
presented in [23] where shape of an object was recovered
using the assumption that the BRDF of any object is
essentially composed of BRDFs of a few fundamental
materials. This method used 12 High Dynamic Range
images with known illumination direction and required
manual selection of a system parameter threshold. Using
large amount of data obtained from a custom built rig,
[24] presented a technique for pose and illumination
variation using single image, that used the Morphable
models to recover the 3D shape and higherorder SVD
to recover the illumination subspace. This method does
not explicitly make the Lambertian assumption but it
requires manual initialization for the 3D model fitting.
When the 3D shapes of the objects are assumed avail
able, [14] presented a technique which, at times using
just 1 image, can recover their spatially varying non
parametric BRDF fields. For the case of human face, this
work presented results with 150 images where specu
lar component was separately captured using polarized
lighting. The images were acquired from know illumi
nation directions with no cast shadows were allowed.
Recently, [16] presented a new method for photometric
reconstruction of shape assuming spatially varying but
isotropic BRDFs. Given 32 or more images with known
illumination, this method recovers isocontours of the
surface depth map from which shape can be recovered
by imposing additional constraints. An extension of
this work was presented in [17] where the need for
additional constraint to recover shape from depth map
isocontour was alleviated by assuming the surface to
be composed of a few fundamental materials and that
BRDF at each point can be approximated by a bivariate
function. Results presented in this work required 102 or
more images.
Lastly, we note that in cases when extremely high
quality renderings are required and costtime constraints
are relaxed, custom hardware is employed. For instance,
highly accurate measurements of material BRDF were
carried out using gonioreflectometer in [45], various
customized hardware components and software were
used to render face images in the movie "The Matrix
Reloaded" [44] and in order to measure accurate skin
reflectance while accounting for subsurface scattering
again custom built devices were employed in [46] to
render high quality facial images.
It can noticed that most of the image based techniques
that do not make the simplifying Lambertian assumption
end up using a large amount of custom acquired data or
CISE TECHNICAL REPORT: REP2009467
(a) A semicircular function (b) Approximation of (a) (c) Max of function in (b) (d) Approximation of (a) (e) Max of
using symmetric functions and zero using antsymmetric func and zero
tions
(f) ABRDF approximated with
symmetric functions leads to un
natural lighting
Fig. 1. Symmetric and Antisymmetric ABRDF approximatic
assuming some other parametric form for BRDF (besides
the other assumptions). In this paper we explore the
possibility of acquiring the nonLambertian reflectance
and shape with just nine images in a purely data driven
fashion.
3 OVERVIEW
The technique that we propose in this paper simultane
ously captures both shape and reflectance properties of
a face. Unlike the majority of existing techniques that
work with BRDFs, in order to seamlessly account for
specularities, attached shadows, cast shadows and other
photoeffects, we have chosen to work with the ABRDFs,
which are spherical functions of nontrivial shape. We
estimate them using cartesian tensors, which in practice,
have enough flexibility to account for the variations
in ABRDF across the human face. Further, in order to
robustly estimate the ABRDF field from only a few
and often noisy samples, we draw upon the apparent
smooth variation of reflectance properties across the
face and combine the cartesian tensors with BSplines.
Finally, the scarcity of data, paired with the nature of
facial ABRDF, forces us to use only the antisymmetric
cartesian tensor components. This combination of anti
symmetric cartesian tensors and BSplines is called Anti
Symmetric Tensor Splines.
Embedded in the ABRDFs at each pixel also lies
the surface normal of the shape. To extract the normal
from the ABRDF field riddled with cast shadows and
specularities, we invoke the homogeneity of the ABRDFs
in local neighborhood, and infer surface normal at a pixel
using the information from its immediate neighbors.
More concretely, at each pixel we align the ABRDF with
its neighbors using our linearized algorithm for rotation
(g) ABRDF approximated with
antisymmetric function leads to
more natural lighting
ins.
recovery and take a weighted geodesic mean of the
normals suggested by the neighbors to obtain the surface
normal. Our framework automatically discounts possi
bly erroneous surface normal L.1;;.' ti' by weighting
the sl..;.ti ..n from a neighbor of substantially different
shape lower than others. This process can be repeated
iteratively and in practice we find good solutions within
1 or 2 iterations.
Equipped with this mechanism to capture both re
flectance properties and shapes of the human faces, we
can generate images of any face in novel poses and
illumination conditions.
3.1 Assumptions
Like all other techniques, our method also works with
certain assumptions. It requires at least 9 images of the
face under point illuminations from known directions
in a fixed pose. Note that these assumptions have been
used in the past by various methods, for example, [23]
worked with 12 images obtained from known lighting
directions in fixed pose. As the number of input images
increases so does the performance of our method. We
do not restrict input images to be free of attached or cast
shadows. We also do not restrict the BRDF to be Lamber
tian ([10]) or isotropic ([16], [14]). Though global photo
effects like subsurface scattering and interreflection are
not explicitly modeled, antisymmetric tensor splines can
capture them to some extent.
4 ANTISYMMETRIC TENSOR SPLINES
We seek a mathematical framework that can represent a
field of spherical functions accurately. If a dense enough
sampling of the spherical function field is provided,
this can be accomplished to arbitrary accuracy, but the
function in (d)
'
 "I V
CISE TECHNICAL REPORT: REP2009467
central problem we face is precisely the scarcity of the
data. To solve this problem for the case of human facial
ABRDF fields, we exploit clues from the specific nature
of ABRDFs on human faces e.g. smooth variation of
ABRDF for the most part, presence of multiple lobes in
the ABRDF etc. Note that the pose is assumed to be fixed
and hence the term "ABRDF" is used to refer to a spherical
function of illumination direction.
4.1 Spherical functions modeled as Tensors
A spherical function in ]R3 can be thought of as a function
of directions or unit vectors, v (v1 v2 v3)T. Such
a function, T, when approximated using an ." order
Cartesian tensor (a tensor in RR3), is expressed as
T(v) Tkln(vl)(v2)l(v3) (1)
k+l+m=n
where Tkim are the realvalued tensor coefficients. This
is a Cartesian tensor with all the n arguments set to be v.
The expressive power of such Cartesian tensors increases
with their order. Geometrically this translates to presence
of more "lobes" on a higher order Cartesian tensor.
Note that the Lambertian model is intricately con
nected to a special case of the Cartesian tensor formula
tion. If 1 (11 12 13i) is the illumination direction, x
(xt x2 x3) is the viewing direction and p is the surface
albedo, the Lambertian kernel is given by
max(p. 1 x,0)
p max(lli + 12x2 + 13X3, 0)
max( TklmX Xx, 0) (2)
kc+l+m=1
with Too p 11, T010 p 12 and Too p 13. A
comparison with Eq. 1 reveals that the Lambertian kernel
is exactly the positive half of the 1st order Cartesian
tensor.
The 1st, 2nd, 3rd and 5th order Cartesian tensors have
3, 6, 10 and 21 unique coefficients respectively. For even
orders, the Cartesian tensors are symmetric, T(v)
T(v), while for odd orders they are antisymmetric,
T(v) T(v). We must point out these definitions
of symmetry and antisymmetry are different than the
standard definition based on switching of the arguments'
order. In this paper, we would use the definitions we
provided above.
Finally, though the higher order tensor can be more
expressive, they can be perceived to be more sensitive to
noise due to their ability to model high frequency details.
In contrast, the lower order tensors are incapable of mod
eling high frequency information but arguably are more
robust to noise. Since it is impossible to discriminate
between high frequency detail and noise in the data, it
is reasonable to say that the high order tensors possess
higher noise sensitivity. Thus, any approximation task
must strike a balance between the high frequency data
fidelity and the noise sensitivity.
Fig. 2. Recovered ABRDFs for a human face. Complex
shapes of ABRDFs in various regions of the face can be
readily noted. (This image is best viewed in color.)
4.2 Tensor Splines
When the task requires estimation of a pdimensional
field of multilobed spherical functions from sparse and
noisy data, given high noise sensitivity of higher order
tensors, it might be reasonable to enforce smoothness
across the field of spherical functions. We accomplish
this by combining Cartesian tensor basis at each pixel
with the BSpline basis ([47]) across the lattice of spher
ical functions.
We define a tensor splines as a Bspline of multilinear
functions of any order. In a tensor spline, the multilinear
functions are weighted by the Bspline basis Ni,k+l,
where
S 1 if ti t< + (3)
1 0 otherwise
and
t ti ti t
Ni,k(t) = Ni,k l(t)  + Nil, (t) 
ti+k ti ti+ i+
(4)
The Ni,k+l(t) are polynomials of degree k, associated
with n+k+2 monotonically increasing numbers called
"knots" (t l, t + ..., tn+).
Tensor spline for pdimensional lattice of spherical
function, with kth degree spline and order Cartesian
tensor is defined as
S(t, v)= (J Niak+1(tl ..t))Ti .. i(v) (5)
a=1...p i,
where t (ti... t) is the index into the lattice, v (vl
v2 v3)T is a unit vector, and Ti ..., (v) is given by Eq. 1.
In the tensor splines the usual BSpline control points
have been replaced by control tensors T1... p (v). The
formulation presented in Eq. 5 is quiet general as it can
be used to estimate spherical function field defined over
CISE TECHNICAL REPORT: REP2009467
Fig. 3. Images synthesized using tensor splines under
novel illumination direction (mentioned on each image as
(azimuth,elevation)). 9 images used as input were illumi
nated from (20,60),(0,45),(20,60),(50,0),(0,0),(50,0),(
50,40),(0,35) and (50,40) directions.
arbitrary dimensional lattice with the desired degree of
BSpline smoothing.
4.3 Facial ABRDF approximation using Tensor
Splines
Human faces are known to be neither exactly Lambertian
nor convex, which leads to photoeffects like specu
larities (oily forehead and nose tip) and cast shadows
(around protruding features like nose and lips) in facial
images. These effects cause such a complex variation
in the intensity values at various pixels as the light
ing direction changes that it cannot be captured by a
single lobed function (like the Lambertian kernel). This
motivated us to explore the use of higher order tensor
splines while modeling the ABRDFs. Note that here the
lattice is 2dimensional and the assumption of locally
homogeneity also holds to a reasonable degree in case of
facial ABRDFs. In order to ensure that the smoothness is
manifested only in a localized fashion, we have chosen to
use bicubic BSplines in the ABRDFspecialized version
of tensor splines.
We must point out that as the order of Cartesian
tensors increases, so does the amount of data samples
required to estimate the unknown coefficients. When
there are only a few images available, in order to satisfy
our desire to use higher order tensors, we must choose
between its antisymmetric or the symmetric compo
nents. Note that since most of the time we are interested
in ABRDFs' behavior on the frontal hemisphere, both
symmetric and antisymmetric components provide the
85.I) 61,61) 12 61) (61.61 (B 5 61
S(t, v) ) Ni,4(tx)N,4(t )Ti,,(v) (6)
i,j
where vectors i, j, t and v have the same meaning as
before and the tensor has an odd order.
The problem at hand is that given a set of Q face
images (Iq, q 1... Q) of a subject in a fixed pose
along with associated lighting directions v, (vl vq2
Vq3), we want to estimate the ABRDF field of the face
using a bicubic tensor spline. We propose to accomplish
this by minimizing the following energy function which
squeezes the L2 distance between the model and the
same representation power. Their behavior only become
pertinent when the illumination direction is exactly per
pendicular to the pose direction, and this is where use
of antisymmetric components is advantageous.
This has been explained via an 2D example in Fig.
1 where the Fig. l(a) shows a semicircular function.
The blue circle in the figure is considered to be the
zero value. Fig. l(b) and 1(d) show the same function
being approximated by a antipodally symmetric and
antisymmetric functions respectively. In can be noted
that the approximation is quite accurate except near
the angles 0 and 1800. When the original function
(Fig. l(a)) is such that it has positive value at one of
these antipodal points and near zero value at the other,
symmetric function forces value at both of these crucial
angles to be positive while the antisymmetric function
force one to be positive and other to be negative. Now, if
we assume that only the positive values of the function
are preserved we get the results as presented in Fig. 1(c)
and l(e).
The behavior of most facial ABRDFs is similar to the
function in Fig. l(a). This is because if a pixel has high
intensity value when lit from 0, most of the time it
would have a dark value when lit from 180 (due to
attached and cast shadows) and vice versa. Thus, if
a symmetric function is used for approximating such
an ABRDF, it would cause nonnegative values at the
both 0 and 180 and would lead to visually significant
artifacts (unnatural lighting) in the images (Fig. l(f)).
On the other hand, in practise, use of antisymmetric
function does not cause visually significant artifacts (Fig.
l(g)). To summarize, even though both, antisymmetric
and symmetric functions, introduce artifacts near 0 and
180 directions, the artifacts created by antisymmetric
approximation are visual insignificant and hence we
have chosen to work with antisymmetric components.
Two dimensional tensor splines with bicubic B
Splines and antisymmetric tensors can be written as
CISE TECHNICAL REPORT: REP2009467
Fig. 4. Images relit with complex lighting. 1st image of both subjects is lit by a point source while the next two are lit by
Eucalyptus Grove and St. Peter's Basilica light probes respectively. Light probes are provided below the facial images.
(This image is best viewed in color.)
given data,
E(Tijkm)
EE(E
q= lt,t, i,j
SlttyE(E
q=1 tz,ty i,j
E T k '
++ ijklmVq V
k+l+m=n
Iq(tx, ty))2 (7)
where t, ty run through the lattice of the given images
and the tensor order n is an odd integer. The mini
mization of Eq. 7 is done with respect to the unknown
tensor coefficients Ti,jk,l,m that correspond to the control
tensors Tij (vn).
To keep the formulation simple we use an uniform
grid of knots in both lattice coordinates. If the pixel
lattice size in the given images is M x M, there are
(M + 2) x (M + 2) control tensors. The number of
unknown coefficients at each control tensor is governed
by the order of the tensor. In the case of 1st order tensors,
there are 3(M + 2)2 unknowns, in the case of 3rd order
tensor, 10(M + 2)2 and in the case of 5th order tensor,
there are 21(M + 2)2 unknowns.
We recover the unknowns in Eq. 7 using the nonlinear
conjugate gradient method with control tensor coeffi
cient field initialized to unit vectors. This technique can
be efficiently implemented because we have obtained the
closed form of the derivative of the objective function
with respect to the unknown coefficients as
dE(Tijklm)/dTijkIm =
2 ( (ENi,4(t)Nj,4(t)TiJ(v) Iq(tx,ty)))
q=1 t,,ty i,j
x (E E( t4 x)Nj,4 1 )). (8)
q=l tx,ty
Once the coefficients have been recovered, images
under novel illumination direction, v, can be synthesized
by evaluating the ABRDF field in the direction v, where
each ABRDF is given by Eq. 5. Possible negative values
obtained in Eq. 5 are set to zero (as in Lambertian
model). Furthermore, it should be noted that the gen
erated images can be readily upsampled by evaluating
Eq. 5 on a more dense sampling lattice since the tensor
spline is a continuous function.
5 CONTINUOUS MIXTURE OF SINGLELOBED
FUNCTIONS
In order to quantitatively validate whether the tensor
spline provide a good enough approximation of ABRDF
field or not, we present a more expressive model here.
This validation model is more general in the sense that
it can accommodate arbitrarily large number of lobes to
approximate any spherical function. We define it using a
continuous mixture ([48]) of singlelobed spherical func
tions. A continuous mixture is characterized by a kernel
function and a mixing density and for the spherical
domain it can be represented as
B(v) f(u)k(u,v)du (9)
where u and v are directions and the integration is over
the sphere.
Of the various choices for singled lobed spherical func
tions that can be used as kernel function k(u, v), we pick
k(u, v) e" 1 primarily due to three reasons. Firstly,
it has a single peak, secondly, k(u, v) 0 for all v such
that v u 0 (because if the viewing and illumination
directions are perpendicular we expect zero intensity)
and thirdly, it leads to a closed form for the continuous
mixture which facilitates efficient implementation. Note
that the first two properties are also satisfied by the
Lambertian kernel.
Since the kernel function is parameterized by direc
tions, we use a mixing density, f(u), which is also
Ni,4(t)Nj,4(t )Tij (V,)
CISE TECHNICAL REPORT: REP2009467
U
Fig. 5. The 1st image is generated using tensor splines,
2"d image is the ground truth and the 3rd is generated
using Lambertian model. Cast shadows and specularities
are much more realistically render using tensor splines
than the Lambertian model.
defined on the space of directions. Spherical analog of
the Gaussian density, vonMises Fisher density is the
natural choice here and it is expressed as
f(xIK, p)
4wFsinh(i)
. e "^V
where p is a unit vector defining the mean orientation
and K is a scalar governing the concentration of the dis
tribution. Substituting our choices for kernel and mixing
density in Eq. 9, we obtain the following expression:
( s, 47sinh(K)
. eV^"u(e"v 1)du.
Here we make the important observation that this in
tegral is the Laplace transform of the von MisesFisher
distribution, which we have analytically computed to be
B(v) sinh(Q), v11 1. (12)
svnh(K)\\1 ii vii
However, since the expression in Eq. 12 is a convo
lution of two single lobed function, it itself is single
lobed. Therefore, to get a model with possibly multiple
lobes, we propose to use a finite mixture of von Mises
Fisher distributions as the mixing density, which leads
to an alternate definition of mixing density as f(u) =
EZ wif(u\K, pi), where wi are the mixture weights and
pi are the various direction along which the component
von MisesFisher distributions are oriented. Using this
mixture of von MisesFisher distributions we obtain the
following expression for the continuous mixture
B(v) wi K ssinh(llK ,
v i) .
 vi 1
We must emphasize that although f(u) has the form of a
discrete mixture, the approximating function B(v) is still
a continuous mixture of the singlelobed kernel functions
k(u, v).
The task of estimating ABRDFs using this continuous
mixture requires us to recover the unknown weights
such that the weighted combination leads to a spherical
function which closely approximates the ABRDFs. Given
a set of N facial images with the same fixed pose with
associated lighting directions v,, we can setup a N x M
matrix A,,, by evaluating Eq. 13 for every v, and pi. M
is the number of pi picked in the model. The unknown
weights (Eq. 13) for each pixel can then be estimated
by solving the overdetermined system AW B, where
B is an Ndimensional vector of the intensities at fixed
pixel in the N given images, and W is the vector of
the unknown weights. Since ABRDF is a nonnegative
function, we solve this system with the positivity con
straint using the nonnegative least square minimization
algorithm developed in [49].
Note that this model would generally have a very
large number of unknowns (depending on the chosen
resolution while picking pi), governed by number of pi
and thus would require a large number of ABRDF field
samples (images) for accurate recovery of the ABRDFs.
But since this would only be used as a tool to evaluate
the tensor splines, it not considered a drawback.
6 RECOVERING SHAPE FROM ABRDF FIELD
Facial ABRDF is in part characterized by the local surface
normal and hence it should be possible to recover shape
information from it. But unlike the various popular
parametric reflectance models like Lambertian, Torrance
Sparrow ([51]), Ph. .n,([50]) etc., which explicitly assume
a role for surface normal in their formulae, tensor splines
make no such assumption, which allows spatially vary
ing and a accurate approximation of the ABRDFs, but
also makes the recovery of surface normal nontrivial.
To recover the surface normal from the tensor spline
model we invoke the local homogeneity of the ABRDF
field. This assumption is physically sound because the
reflectance properties of a human face does not change
drastically in small neighborhoods (3 x 3 pixels) and
mathematically robust as tensor splines ensures that that
coefficient vary smoothly across the ABRDF lattice. We
assume that ABRDFs at two neighboring pixels have
the same shape and differ only by a rotation, R and
thus, if surface normal at one of these pixels is known,
surface normal at the other pixel can be derived using
the rotating R.
For a given internal pixel (x, y) in the image, there
are eight immediate neighbors. If surface normal at
(x, y) is inferred as described above, it would get eight
SI.;4.'. ti~ for possible surface normals (assuming that
the surface normals for neighbors are known). Instead of
picking one of the sI.;.:ti ..1 as its surface normal, we
take a weighted geodesic average of the suggested vec
tors. The weights are set to be inversely proportional to
the registration error obtained during rotationalignment
of the ABRDF pairs. There are two main advantages of
computing the surface normal is this manner. Firstly, be
ing an aggregate statistic, geodesic mean is more robust
to noise than individual i;:;.'ti..4 n Secondly and more
importantly, the weighted nature of the mean ensures
that IL.;..'ti .. which originate from neighbors whose
ABRDFs are too different in shape than the ABRDF at
(x, y), are automatically weighted less. This property of
the mean is specially useful at locations in the image
where the homogeneity assumption is weak, e.g. shadow
edges.
CISE TECHNICAL REPORT: REP2009467
Fig. 6. Distribution of errors as the configuration of input changes. Xaxis represents azimuth and Yaxis represents
elevation angles. Hotter colors show larger errors. The black dots represent the exact directions of illumination in
images used as input. (This image is best viewed in color.)
Once the rotation matrices for all the pixels in the
image have been computed, we initialize all the normals
with the directions in which ABRDFs have the maxima.
Initialization is followed by weighted geodesic mean
computation which provides us with a robust estimate of
the surface normals. The process of mean computation
can be carried out iteratively but empirically it was
noticed that good results can be obtained in all cases
with 1 or 2 iterations. Note that using the maxima
directly as a normal estimate provides inaccurate results.
We attribute this to the fact that unlike some reflectance
models (e.g. Lambertian), tensor spline do not ensure
that the maximal response of the ABRDF lies along the
surface normal direction.
6.1 Rotation Estimation
Recovering the surface normal field using the steps de
scribed above requires computation of rotation matrices
for each pair of neighbor ABRDFs in the image. A simple
but computationally intensive approach would be to
search for the rotation matrix using a gradient based con
strained optimization technique. More concretely, two
ABRDFs, represented by their Cartesian tensor coeffi
cients wl and W2, can be aligned by minimizing the
following objective function
E(R)= (wTB(v) 
veS2
such that
RTR I
w B(R .v))2.
(14)
(15)
where unit vector v is obtained by some uniform sam
pling of the sphere, B is the vector of Cartesian tensor
basis defined in Eq. 1 and R is the sought rotation matrix.
This method for rotation matrix recovery would require
nonlinear optimization to be run ~ 8L2 times for an
image of size L x L pixels. Even for an average sized
image this process can be quite inefficient and hence,
we propose the following more efficient algorithm for
the rotation matrix recovery.
Let Ti(v) and T2(v) be the two ABRDFs (Eq. 1) that
need to be aligned via a rotation. This implies that we
seek a 6v such that
Ti(v) T2(v + v). (16)
Since the ABRDFs are from neighboring pixels, we as
sume that the required 6v would be small and thus using
the first order Taylor's expansion, we get
Ti(v) = T2(v) + VT2 (v)TSv. (17)
As we expect
L v = v + 6v, (18)
where L is a linear transformation containing the rotation
matrix, we get
Ti(v) T2(v) + VT2(v)Tv VT2(v)T Lv, (19)
which leads to the linear system
Ax B, (20)
where ith row of A contains vectorized entries of
VT2(vi)viT, x contains the vectorized entries of L, ith
entry of B is Ti(vi) T2(vi) + VT2(vi)Tvi and vi are
the unit vector obtained from some uniform sampling
of a sphere. The embedded rotation matrix R can be
recovered using the QR decomposition from L.
6.2 Surface Normal Computation
As described earlier, the surface normal, n, at a pixel
(x,y) with ABRDF T can be computed by taking a
weighted geodesic mean of the normals suggested by its
CISE TECHNICAL REPORT: REP2009467
35
a
g 30
S25
1 20
^15
10
5
Fig. 7. Per pixel intensity error comparison. The blue
color show errors on 1st and 2nd subset combined which
contain lighting directions, 4, smaller than 25 the green
color shows the 3rd subset with 25 < < 50 and the
red color shows the 4th subset with 50 < 0 < 700. (This
image is best viewed in color.)
neighboring pixels. Let all of its p immediate neighbors
be indexed 1...p with corresponding ABRDFs as Tp,
normals as np and the rotation matrices computed using
the process described above as Ri... R,. The normal at
(x, y) is given by
P
n = argmin, d( TpTiinp,' ), (21)
i=1
where d() is the geodesic distance defined on the space
of unit normals, arc length. We seek a geodesic mean
because the domain of unit normals is the unit sphere
and not the Euclidean space. This mean is also known as
Karcher mean and can be computed using the following
iterative scheme 
P C expP(Ev) (22)
v=(1/n) 6exp n (23)
i 1
where exp, the exponential map, is given as
expp(V) = cos(eI)t + sin(jI)(v/v) (24)
and exp, 1(np), the log map, is defined as
exp (np) ucos ((p, np))// ((u,u)) (25)
where
u np (np, j)p, (26)
and c is the iteration step size. For more details on
computing means on manifolds see [53] and references
therein.
6.3 Shape Recovery
Once the normal field has been computed, we use one
of the standard techniques ([52]) to recover surface. If
z(x,y) defines the surface, normal a location (x,y) is
given by (zx zy 1)T where zx and z, denote the partial
derivative of the surface with respect to x and y. If (nx
,, nz)T denotes the surface normal at location (x, y), we
have the following relations
Zx = nx/n,
Lambertian 3.Tensor 5Tensor Continuous
Spline Splin Mixture of
Exponentials
1* jl 1"p
Using forward difference approximation of the partial
derivatives we obtain the following two equations
nzz(x + 1, y) nzz(x,y) = n,
nzz(x, y + 1) nzz(x, y) =
which provide a linear relation between the surface
values at the grid point and the known surface normals.
Surface can now be recovered by solving an overde
termined system of linear equations. At the boundary
points, the above formulation is not valid and the surface
is recovered by solving the following equation, obtained
by eliminating nz above
nzz(x,y)n z(x,y+l) =. : ,+1l,y),. :(, y). (31)
6.4 Novel Pose Relighting
With the facial shape at hand, novel poses can be ren
dered by simply changing the viewpoints. But generat
ing novel illumination conditions in the novel pose is
not trivial as ABRDFs estimated from a different pose
cannot be directly use. If the ABRDF field was estimated
in pose P1 and if we wish to generate image with
novel illumination in a new pose P2, we have to rotate
the ABRDFs by the same rotation which is required to
change Pi to P2. Once the orientations of the BRDFs
have been rectified, images of the face in the new pose
with novel illumination can be generated by sampling
the ABRDF field in desired directions.
We would like to point out that the specularities
are view dependent and accurately speaking, cannot be
directly transferred from one pose to another. Most of
the existing Lambertian methods ignore this effect but
the few who deal with this problem, handle it by either
explicitly obtaining specular component by using polar
ized lighting (e.g. [14], [25]), which required specialized
data acquisition, or by assuming a parametric form for
the specular component of lighting (e.g. [22]).
Our Cartesian tensor representation for ABRDF does
not discriminate against specularities and estimates the
ABRDF as best possible from the available intensity val
ues. Thus it should possible to recover and manipulate
specular component separately, but at this stage, we
make the assumption that specularities do not change
drastically across facial poses. The validity of this as
sumption is supported by the results presented in the
next section.
I. /n,.
CISE TECHNICAL REPORT: REP2009467
(e) 9 input images (f) Quiver plot of the normal field (g) Color encoded normals (Rx, (h) Recovered shape in novel
Gz, By) pose
Fig. 8. Recovered shapes from 9 images. (This image is best viewed in color.)
7 EXPERIMENTAL RESULTS
In order to evaluate the proposed method for relighting
and shape recovery, we conducted several detailed ex
periments which are presented here. Since it has been
shown for the popular Lambertian model that the space
of images with illumination variation can be approxi
mated quiet accurately using a 9 dimensional subspace
([12], [13]), we have taken on the challenge of also work
ing with just 9 image. Note that with 9 samples of the
ABRDF field and splines based smoothness constraint,
at most 10 coefficients can be recovered and hence our
central results use bicubic 3rd order tensor splines.
The experiments were carried out on Extended Yale B
[43] (28 subjects, in 9 poses and 64 illumination condi
tions) and CMU PIE [54] (68 subjects in 13 poses and 43
illumination conditions) benchmark databases. Note that
CMU PIE has 21 usable point source illuminated images
while in Extended Yale B all 64 illuminations are point
source.
7.1 Relighting faces
We begin by noting that antisymmetric tensor splines
can capture nontrivial shapes of facial ABRDFs. In Fig.
2 we show the ABRDF field of a subject from Extended
Yale B database estimated using 9 images. Three different
regions of the face have been shown in detail where com
plicated shapes of the ABRDF can be noticed. Regions
A and B have more complicated shapes because these
ABRDFs have to accommodate shadows. The spherical
functions in the image have been color coded based on
maximal value direction. Mapping of directions to colors
is provided in the lower right corner.
Next we present results for relighting of faces in novel
illumination direction. In Fig. 3 four different subjects
lit in various novel point source illumination are de
picted. For the first two rows the illumination direction
varies across the azimuth angle while in the next two
rows variation is in the elevation angle. It can noticed
that our method can accurately interpolate as well as
extrapolate from the images provided as input. Further,
difficult effects like cast shadows and specularities have
been photorealistically rendered without using any ad
ditional ray tracing.
Starting with 9 images, our technique estimates the
entire ABRDF field and thus images lit in fairly com
plex lighting conditions can be rendered. In Fig. 4 we
present such results for two subjects from the CMU PIE
database. Below each image is its lighting condition. The
first image of both subjects is one of the nine input
images taken to estimate their ABRDF fields. The next
two images for each subjects are lit by light probes
([55]) named Eucalyptus Grove and St. Peter's Basilica
respectively. For color images we estimate the ABRDF
field for each channel separately. The images were relit
by taking a weighted combination of the point source
CISE TECHNICAL REPORT: REP2009467
99
~a ~?CT
Fig. 9. Detailed pose variation with textureless in upper right and depthmap in lower right. Accurate renderings even
in extreme poses can be noticed.
lit images where the weights were determined using the
light probes. We used 2500 samples of the light probe to
render these images.
In Fig. 5 we provide a qualitative comparison between
our method and the Lambertian model. The first face
image in Fig. 5 is render using tensor splines, the second
is the ground truth and the third image is rendered
using the Lambertian model. It can be readily noted
that image obtained using our method is much more
closer to the ground truth than the one rendered using
Lambertian model. The arrows show the locations of
important differences cast shadows and specularities.
The next two experiments try to quantitatively capture
the performance of our method. First, we explore the im
pact of the input data on the estimation when the tensor
order is fixed (3rd order in this case). For this we use
Extended Yale B dataset as it provides ground truth for
64 directions. To set a baseline we estimated the ABRDF
field for 10 subjects using all the 64 images as input,
rendered images in the same 64 direction and computed
the total error with respect to the ground truth image
used as input. Next, errors were computed similarly for
3 other cases where only 9 images were used as input to
our method but in different configurations. Two of the
cases had images with illumination directions uniformly
distributed in front of the face while one had images
with direction biased toward one side.
To visualize the distribution of the obtained errors,
we color coded them with large error in hotter colors
and smaller error in cooler colors and plotted them as
a continuous images in Fig. 6. The X axes of these
images show azimuth angle varying from 130 to
130 (from leftmost black dot to rightmost) and Y axes
show elevation angle varying form 40 to 90 (from
topmost black dot to bottom most). The black dots in
these images show the exact direction of illumination in
the images used as input. It can be readily noted that
when all 64 images are used as input Fig. 6(a), the error
is the least. For the 9 image case, Fig. 6(b) and 6(c),
where the illumination directions in input images are
uniformly distributed, the error is more than 6(a) but
notedly less than the case when distribution is skewed
in one direction, 6(d). Hence, as expected, our method
performs better when the input images are uniformly
sampled from the sphere. Moreover, the errors in all
cases are concentrated towards the extreme illumination
angles and for near frontal illumination condition the
performance is not affected too much by the input image
distribution.
Next we present a quantitative comparison of our
method with Lambertian model and the validation
model presented in Section 5. A natural question that
arises is that why should an order 3 Cartesian tensor
be suitable for estimating the ABRDF? To answer this
question, we computed the average intensity error per
pixel over all 48 subjects in 64 illumination directions of
Extended Yale B dataset using the Lambertian model,
3rd order tensor splines, 5th order tensor spline and
continuous mixture of exponentials. All 64 illumination
directions were used for the continuous mixture model
(on account of large number of unknowns) while for
the other three only 9 images (configuration shown in
Fig. 6(b)) were used. We set the pi values required for
the continuous mixture model using a dense sampling
(642 directions) of the unit sphere obtained by the 4th
order tessellation of the icosahedron. We have presented
pp'9
CISE TECHNICAL REPORT: REP2009467
Fig. 10. Simultaneous pose and illumination variation. (This image is best viewed in color.)
results in Fig. 7 shattered along the standard subsets
(subset 4 in red, 3 in green and 1 + 2 in blue) of
Extended Yale B database. As expected, error for the
subset with extreme lighting (subset 4) is more than
other other set for all method. More importantly, even
with considerably large amount of data and very flexible
estimation model, the errors obtained from continuous
mixture of exponentials method is quite similar to those
obtained from 3rd order tensor splines. This indicates
that though a 3rd order tensor spline can only accommo
date three lobes, for most facial ABRDFs this is suitable
enough. The 3rd order tensor spline outperforms the
Lambertian model and even the 5th order tensor spline,
which suggests possible overfitting.
7.2 Estimating Shape
All the results presented till now assumed a fixed pose
but using the technique presented in Section 6 we can
simultaneously vary illumination and pose of a face.
Fig. 6.3 summarizes the results produced by our shape
recovery algorithm for one subject each from Extended
Yale B and CMU PIE databases. The first column shows
the 9 input images, the second column shows the quiver
plot of the estimated normal field (zoom in to see
details), the third column present the surface normal
information in a color coded form (x components of
normal field are mapped to red channel, y components to
blue and z components to green channel) and the fourth
column shows the recovered shape in a novel pose. For
the case of color image, shape estimation was carried
out using only the luminance component. In both the
cases occlusion of appropriate regions of the face due to
pose change can be noted from the images in the fourth
column.
In Fig. 9 we present more detailed results for pose
variation with a fixed illumination. The 3 rows of images
show a subject from Extended Yale B in different poses
ranging from right profile to left profile as we go from
left to right and viewpoint varying from below to face to
above the face as we go from top to bottom. Note that the
ABRDF filed for this subject recovered using just 9 im
ages under the illumination configuration shown in Fig.
6(b). The Recovered shape for the same subject, rendered
with constant albedo and specularities is also presented
towards the right end of the figure. This allows finer
details of the shape to be shown without any texture to
bias the observer. And finally, to the lower right of the
figure is the height map for the same subject. It can be
noted that our shape recovery algorithm can produce
good results without making simplifying Lambertian
assumption.
Finally, we present results when both pose and illu
mination conditions are simultaneously varied. In Fig.
10 one subject each from CMU PIE and Extended Yale
B database are shown in various poses and illumination
conditions. The ABRDF fields for both cases were re
covered using 9 images and shape for the color image
recovered using the luminance channel. With the change
of pose we have retained the ABRDF field learnt using
the front pose but it can noted that the results are
photorealistic even when specularities are not explicitly
modified and transferred.
7.3 Face Recognition
Face recognition is one of most popular application of
facial image analysis. It is generally defined as given
a database of facial images of various people, called
gallery, identify the person in a novel test image, called
probe, as one the people present in the database. The
CISE TECHNICAL REPORT: REP2009467
degree of difficulty of this problem increases as the
difference between the probe and the gallery images
increases. This difference could be in illumination, oc
clusion, expression, pose or any combination of these.
In recent times, illumination invariant face recognition
has attracted particular interest due advances in our
understanding of reflectance modeling. Here we present
a comparative study of illumination invariant face recog
nition. When using tensor splines, we assume that for
each subject 9 gallery images with known illumination
directions are available. From these images we compute
the ABRDF field and generate images with novel illu
mination for a dense sampling of directions. This step
exapnds our collection of 9 gallery images to any desired
size. The probe image is then matched to all the images
in the database and the subject with the closest matching
image is assumed to be the correct identity of the probe
image.
We have used Extended Yale B data for this exper
iment primarily because most of the existing methods
have presented face recognition results on the same
database. As mentioned before, the database is divided
into 4 subsets with the lighting getting more and more
extreme as we go from subset 1 to 4 and thus, the diffi
culty to classify images from these subsets also increases.
The obtained recognition error rate are reported in Table
2. We have also presented results reported by existing
methods and their respective references are listed next to
the method name. Results for the first seven techniques
were taken from [7] and rest were taken from respective
references. Along with the error rates we have also listed
the number of images required by each method in the
gallery set. For our method we used the nine images
in the configuration shown in Fig. 6(b). It can be noted
that even with the naive nearest neighbor classification
strategy our method produces near perfect results.
8 CONCLUSIONS, LIMITATIONS AND FUTURE
WORK
In this paper we have presented a novel comprehensive
system for capturing the reflectance properties and shape
of the human faces using antisymmetric tensor splines.
This method can be used to synthesize photorealistic
images of a faces in novel poses and illumination con
ditions. Our method require as few as 9 images with
known illumination directions in order to fully recover
the model parameters and the accuracy of the approx
imation improves as more data is made available. The
output of this technique is a continuously varying field
of ABRDFs which can be used for, besides relighting and
shape recovery, meaningfully up sampling the image
and succinctly describing the ABRDF field of the face.
It should be noted that our method does not require the
input images to be free of attached or cast shadows.
As compared to the ideal solution described in the
introduction, we work with multiple images lit with
TABLE 2
Face recognition errors rates. N is the number of
required input images.
Method N Subset Subset Subse Total
1&2 3 4
Correlation [3] 4 0.0 23.3 73.6 29.1
Eigenfaces [8] 6 0.0 25.8 75.7 30.4
Linear subspace [2] 7 0.0 0.0 15.0 4.7
Conesattached [10] 7 0.0 0.0 8.6 2.7
Conescast [10] 7 0.0 0.0 0.0 0.0
9PL [43] 9 0.0 0.0 2.8 0.8
3D SH [7] 1 0.0 0.0 2.8 0.8
Harmonic (SFS) [1] 1 0.0 0.0 12.8 4.0
Tensor Splines 9 0.0 0.0 1.6 0.5
point sources in known illumination direction. Our fu
ture work would focus on recovering the ABRDF field
without requiring multiple images with known illumi
nation. Further, while relighting novel poses, we assume
that the specular component of the ABRDF remains
the same which is not precise. As we recovered the
surface normals from the nontrivial shapes of ABRDFs,
we believe that the specular component can also be
recovered, but that is yet to be explored. Lastly, in
order to avoid over fitting noisy and sparse data, we
built antisymmetric tensor spline with inherent local but
uniform smoothness constraints, but ideally one would
like to have more control over this and it is something
that we also wish to explore in future. Note that these
assumptions are loosening of a few of the constraints we
set out to satisfy while looking for an "ideal" solution
but almost all existing techniques make similar or more
restrictive assumptions (Table. 1).
In conclusion, antisymmetric tensor splines provide
a useful framework for analysis and modeling of illu
mination and pose variation of facial images with var
ious possible applications like relighting, pose change,
face recognition, upsamplings, compression etc. It also
shows that the analysis of shape and reflectance in a
collective manner through ABRDFs seems promising as
an alternate to separate facial BRDF and shape analysis.
ACKNOWLEDGMENT
A part of the work presented here on relighting appeared
in the Proceedings of IEEE CVPR 2008 [29]. This research
was in part funded by the University of Florida Alumni
Fellowships to Ritwik Kumar and Angelos Barmpoutis.
REFERENCES
[1] W.A.P. Smith, E. R. Hancock, "Facial Shapefromshading and
Recognition Using Principal Geodesic Analysis and Robust Statis
tics", IJCV, 76:7191, 2008.
[2] A. U. Batur and M. Hayes, "Linear subspaces for illumination
robust face recognition", CVPR, 2:296301, 2001.
[3] T. Brunelli, R. Poggio, "Face recognition: features versus tem
plates", PAMI, 15(10):10421052, Oct 1993.
[4] Z. Wen, Z. Liu and T. Huang,"Face Relighting with Radiance
Environment Maps", CVPR, page 158165, 2003.
[5] M. Chandraker, S. Agarwal and D. Kriegman, "ShadowCuts:
Photometric stereo with shadows", CVPR, 2007.
CISE TECHNICAL REPORT: REP2009467
[6] S. K. Zhou, G. Aggarwal, R. Chellappa, D. W. Jacobs, "Appearance
Characterization of Linear Lambertian Objects, Generalized Pho
tometric Stereo, and IlluminationInvariant Face Recognition",
PAMI 29(2):230245, February 2007.
[7] L. Zhang and D. Samaras, "Face Recognition from A Single Train
ing Image under Arbitrary Unknown Lighting using Spherical
Harmonics", PAMI, 28(3):351363, March 2006.
[8] P. Hallinan, "A lowdimensional representation of human faces
for arbitrary lighting conditions", CVPR, pages 995999, 1994.
[9] V. Blanz and T. Vetter, "A morphable model for the synthesis of
3D faces", SIGGRAPH, pp. 187194, 1999.
[10] A. Georghiades, P. Belhumeur and D. Kriegman, "From Few to
Many: Illumination Cone Models for Face Recognition Under
Variable Lighting and Pose", PAMI, 23(6):64360, 2001.
[11] R. Basri, D. Jacobs and I. Kemelmacher,"Photometric Stereo with
General, Unknown Lighting", IJCV, 72(3): 239257, 2007.
[12] R. Basri and D. Jacobs, "Lambertian reflectances and linear sub
spaces", PAMI, 25(2):218233, 2003.
[13] R. Ramamoorthi and P. Hanrahan, "A SignalProcessing Frame
work for Inverse Rendering", SIGGRAPH, pp. 117128, 2001.
[14] T. Zickler, R. Ramamoorthi, S. Enrique and P. Belhumeur, "Re
flectance Sharing: Predicting Appearance from a Sparse Set of
Images of a Known Shape", PAMI, 28(8), August 2006.
[15] C. Hernndez, G. Vogiatzis, G. Brostow, B. Stenger and R. Cipolla,
"Nonrigid Photometric Stereo with Colored Lights", ICCV, 2007.
[16] N. Alldrin and D. Kriegman, "Shape from Varying Illumination
and Viewpoint", ICCV, 2007.
[17] Neil Alldrin, Todd Zickler, and David Kriegman, "Photometric
Stereo With NonParametric and SpatiallyVarying Reflectance",
CVPR, 2008.
[18] W. M. Silver, "Determining Shape and Reflectance Using Multiple
Images", Masters thesis, MIT, 1980.
[19] A. Hertzmann and S. M. Seitz, "Shape and materials by example:
A photometric stereo approach", CVPR, 2003.
[20] A. Hertzmann and S. M. Seitz, "Examplebased photometric
stereo: Shape reconstruction with general, varying brdfs", PAMI,
27(8):12541264, 2005.
[21] A. S. Georghiades, P. N. Belhumeur and D. J. Kriegman,
"IlluminationBased Image Synthesis: Creating Novel Images of
Human Faces Under Differing Pose and Lighting", Proceedings
of the IEEE Workshop on MultiView Modeling and Analysis of
Visual Scenes, page 47, 1999.
[22] A. S. Georghiages, recoveringg 3D Shape and Reflectance From
a Small Number of Photographs", Eurographics Symposium on
Rendering, June 2003, pp. 230240, 315.
[23] D.B. Goldman, B. Curless, A. Hertzmann, S. M. Seitz, "Shape
and SpatiallyVarying BRDFs from Photometric Stereo", ICCV, pp.
341348, 2005.
[24] J. Lee, B. Moghaddam, H. Pfister and R. Machiraju, "A Bilinear
Illumination Model for Robust Face Recognition", ICCV, pp 1177
1184, 2005.
[25] P. Debevec, T. Hawkins, C. Tchou, H. Duiker, W. Sarokin, and
M. Sagar, "Acquiring the Reflectance Field of a Human Face",
SIGGRAPH, pp. 145156, 2000.
[26] S. Magda, T. Zickler, D. Kriegman and P. Belhumeur, "Beyond
Lambert: Reconstructing Surfaces with Arbitrary BRDFs", ICCV,
2001.
[27] Z. Yue, W. Zhao and R. Chellappa, "PoseEncoded Spherical
Harmonics for Face Recognition and Synthesis Using a Single
Image", EURASIP Journal on Applied Signal Processing, 2008.
[28] W. Zhao and R. Chellappa,"Symmetric Shape from Shading Using
SelfRatio Image", IJCV, Vol. 45, pp. 5575, 2001.
[29] A. Barmpoutis, R. Kumar, B. C. Vemuri and A. Banerjee,"Beyond
the Lambertian Assumption: A generative model for Apparent
BRDF fields of Faces using AntiSymmetric Tensor Splines",
CVPR 08, 2008.
[30] S. Biswas, G. Aggarwal and R. Chellappa, "Robust Estimation
of Albedo for Illuminationinvariant Matching and Shape Recov
ery", ICCV, 2007.
[31] K. C. Lee and B. Moghaddam, "A practical face relighting method
for directional lighting normalization", International Workshop
on Analysis and Modeling of Faces and Gestures, 2005.
[32] A. Shashua and T. RiklinRaviv, "The quotient image: Classbased
rerendering and recognition with varying illuminations", PAMI,
23(2):129139, 2001.
[33] R. Gross, I. Matthews and S. Baker, "AppearanceBased Face
Recognition and LightFields.", PAMI, 26(4):449465, 2004.
[34] H. Chan and W. W. Bledsoe, "A ManMachine Facial Recognition
System Some Preliminary Results", Panoramic Research Inc,
Palo Alto, 1965.
[35] A Shashua,"On Photometric Issues in 3D Visual Recognition from
a Single 2D Image" IJCV, vol. 21, pp. 99122, 1997.
[36] A.L. Yuille, D. Snow, R. Epstein and P.N. Belhumeur, "Determin
ing Generative Models of Objects under Varying Illumination:
Shape and Albedo from Multiple Images Using SVD and Inte
grability", IJCV, vol. 35, pp. 203222, 1999.
[37] P.N. Belhumeur and D.J. Kriegman, "What Is the Set of Images
of an Object under All Possible Illumination Conditions?" IJCV,
vol. 28, pp. 245260, 1998.
[38] T. Malzbender, D. Gelb, and H.Wolters. "Polynomial Texture
Maps", SIGGRAPH, pages 519528, 2001.
[39] R. Ramamoothi and P. Hanrahan, "On the Relationship between
Radiance and Irradiance: Determining the Illumination from Im
ages of a Convex Lambertian Object," Journal of Optical Society
of America, vol. 18, pp. 24482459, 2001.
[40] P. Belhumeur, D. Kriegman, A. Yuille, "The BasRelief Ambigu
ity", IJCV, '! 1, 1999, pp. 3344
[41] J. P. O'Shea, M. S. Banks, M. Agrawala, "The Assumed Light
Direction for Perceiving Shape from Shading", ACM Applied
Perception in Graphics and Visualization (APGV) 2008, August
2008, pp 135 142.
[42] Z. Liu, Y. Shan, and Z. Zhang, "Expressive expression mapping
with ratio images", SIGGRAPH, pages 271276, 2001.
[43] K. Lee, J. Ho and D. Kriegman, "Acquiring Linear Subspaces for
Face Recognition under Variable Lighting", PAMI, 27(5): 684698,
2005.
[44] G. Borshukov and J.P. Lewis, "Realistic Human Face Rendering
for "The Matrix Reloaded", Sketches and Applications Program,
SIGGRAPH, 2003.
[45] S. C. Foo. "A gonioreflectometer for measuring the bidirectional
reflectance of material for use in illumination computation",
Masters thesis, Cornell University, Ithaca, 1997.
[46] T. Weyrich, W. Matusik, H. Pfister, B. Bickel, C. Donner, C. Tu,
J. McAndless, J. Lee, A. Ngan, H. W. Jensen, and M. Gross,
"Analysis of Human Faces using a MeasurementBased Skin
Reflectance Model", SIGGRAPH, Boston, 2006.
[47] C. de Boor, "On Calculating with BSplines", Journal of Approx
imation Theory, 6:5062, 1972.
[48] B. Jian and B. C. Vemuri, "Multifiber reconstruction from Diffu
sion MRI using Mixture of Wisharts" IPMI, 384395, 2007.
[49] C. Lawson and R. J. Hanson, "Solving Least squares problems",
PrenticeHall, Englewood Cliffs, 1974.
[50] B.T. Phong, "Illumination for computer generated images", Com
munications of the ACM, 18(6):311317, June 1975.
[51] K.E. Torrance and E.M. Sparrow, "Theory for offspecular reflec
tion from roughened surfaces", J. Opt. Soc. Am., 57:11051114,
1967.
[52] B.K.P. Horn, "Robot Vision", MIT Press, 1986.
[53] A. Srivastava, I. Jermyn and S. Joshi, "Riemannian Analysis
of Probability Density Functions with Applications in Vision",
CVPR, June, 2007.
[54] T. Sim, S. Baker, and M. Bsat, "The CMU Pose, Illumination, and
Expression (PIE) Database of Human Faces", Technical Report
CMURITR0102, Robotics Institute, Carnegie Mellon University,
Pittsburgh, PA, January 2001.
[55] Paul Debevec, "Rendering Synthetic Objects Into Real Scenes:
Bridging Traditional and ImageBased Graphics With Global Illu
mination and High Dynamic Range Photography", Proceedings
of SIGGRAPH 98, pp. 189198.
