<%BANNER%>

Sparsity Based Image and Video Processing

Permanent Link: http://ufdc.ufl.edu/UFE0041971/00001

Material Information

Title: Sparsity Based Image and Video Processing
Physical Description: 1 online resource (152 p.)
Language: english
Creator: Xu, Jun
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

Subjects

Subjects / Keywords: image, sparsity, transform, video
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre: Electrical and Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: The wide usage of digital multimedia consumer electronics leads to the rapid explosion of the amount of image and video data for sharing, storage and transmission over networks. How to find efficient algorithms to process these data raises great challenges. Sparsity is a useful measurement of the efficiency. In this dissertation, we address these problems based on exploring the sparsity of data representations from data independent and data dependent perspectives. In the first part of this dissertation, we study image representation using data independent approaches. For image representation, transforms can be used to convert original data into transform coefficients to eliminate the correlation among original data. We focus on images with singularities, which are very difficult to represent by conventional transforms. We propose Ripplet transform type I (Ripplet-I), which generalizes Curvelet transform by introducing support and degree parameters. Curvelet transform is just a special case of Ripplet-I transform with $c=1,d=2$. Ripplet-I transform can represent images with singularities along arbitrary curves efficiently. Ripplet-I transform achieves sparser representation than Curvelet transform and demonstrates superior performance in applications such as image compression and image denoising. Following the strategy of Ridgelet transform, we propose Ripplet transform type II (Ripplet-II) based on generalized Radon transform. Ripplet-II transform maps singularities along curves of images in the pixel domain to point singularities in the generalized Radon domain. Point singularities are further represented sparsely by wavelet transform. Ripplet-II transform has demonstrated better performance in image classification than conventional transforms. In the second part of this dissertation, we study performance improving approaches by enhancing sparsity of image representations using the information from data. These approaches are data dependent and vary from case to case. To remove artifacts introduced by a video codec, we enhance the sparse representation of true signals to remove artifacts and preserve true signals; to improve video coding efficiency, we provide better intra prediction and adaptive block pattern to enhance the sparsity of residuals; to provide videos with smooth quality, we propose sparsity based rate control algorithm for video coding with constraints on distortion fluctuations.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Jun Xu.
Thesis: Thesis (Ph.D.)--University of Florida, 2010.
Local: Adviser: Wu, Dapeng.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2011-02-28

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0041971:00001

Permanent Link: http://ufdc.ufl.edu/UFE0041971/00001

Material Information

Title: Sparsity Based Image and Video Processing
Physical Description: 1 online resource (152 p.)
Language: english
Creator: Xu, Jun
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2010

Subjects

Subjects / Keywords: image, sparsity, transform, video
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre: Electrical and Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: The wide usage of digital multimedia consumer electronics leads to the rapid explosion of the amount of image and video data for sharing, storage and transmission over networks. How to find efficient algorithms to process these data raises great challenges. Sparsity is a useful measurement of the efficiency. In this dissertation, we address these problems based on exploring the sparsity of data representations from data independent and data dependent perspectives. In the first part of this dissertation, we study image representation using data independent approaches. For image representation, transforms can be used to convert original data into transform coefficients to eliminate the correlation among original data. We focus on images with singularities, which are very difficult to represent by conventional transforms. We propose Ripplet transform type I (Ripplet-I), which generalizes Curvelet transform by introducing support and degree parameters. Curvelet transform is just a special case of Ripplet-I transform with $c=1,d=2$. Ripplet-I transform can represent images with singularities along arbitrary curves efficiently. Ripplet-I transform achieves sparser representation than Curvelet transform and demonstrates superior performance in applications such as image compression and image denoising. Following the strategy of Ridgelet transform, we propose Ripplet transform type II (Ripplet-II) based on generalized Radon transform. Ripplet-II transform maps singularities along curves of images in the pixel domain to point singularities in the generalized Radon domain. Point singularities are further represented sparsely by wavelet transform. Ripplet-II transform has demonstrated better performance in image classification than conventional transforms. In the second part of this dissertation, we study performance improving approaches by enhancing sparsity of image representations using the information from data. These approaches are data dependent and vary from case to case. To remove artifacts introduced by a video codec, we enhance the sparse representation of true signals to remove artifacts and preserve true signals; to improve video coding efficiency, we provide better intra prediction and adaptive block pattern to enhance the sparsity of residuals; to provide videos with smooth quality, we propose sparsity based rate control algorithm for video coding with constraints on distortion fluctuations.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Jun Xu.
Thesis: Thesis (Ph.D.)--University of Florida, 2010.
Local: Adviser: Wu, Dapeng.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2011-02-28

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2010
System ID: UFE0041971:00001


This item has the following downloads:


Full Text





SPARSITY BASED IMAGE AND VIDEO PROCESSING


By

JUN XU



















A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA


2010



























2010 Jun Xu


































To my beloved parents









ACKNOWLEDGMENTS

First of all, I am heartily thankful to my advisor, Professor Dapeng Oliver Wu,

whose encouragement, guidance and support enlighten the development of my research.

His enthusiasm, his inspiration and his great efforts to explain things clearly and simply

established a role model for me. He has made available his support in a number of v-- i-

encouragement, sound advice, good ', 11;i:.- and lots of good ideas. This thesis would not

have been possible without his insightful guidance and strict training in creative thinking,

logical reasoning and writing skills.

I would also like to thank Professor Tao Li, Professor Yijun Sun and Professor Scott

Banks for serving on my dissertation committee and providing valuable feedbacks on my

research. I am indebted to my Master advisor Professor Jingli Zhou for bringing me to the

world of image and video processing.

I am thankful to my officemate in Multimedia Communications and Networking Lab

at UF. It is great fortune for me to join this friendly family. I would like to thank senior

lab member Dr. Jieyan Fan for his help and advice in my early di,-4 in US. Thanks to

Zhifeng C(!, i. Bing Han, Taoran Lu, and Wenxing Ye for useful discussions in many

research problems. I want to thank Dr. Xiaochen Li, Xihua Dong, Yiran Li and Yunzhao

Li for their support and friendship. My appreciation also goes to Shanshan Ren, Ziyi

Wang and Lin Zhang for their support, 1,-i iiii and generosity to this group. Thanks

to Dr. Huilin Xu, Wenshu Z!i i,.- and Dongliang Duan. I would like to thank Yan Li

and Lu C(! ii, my friends for over ten years, for their kindness and support ever since we

met. It was real great to meet and know all these friends and spend wonderful four years

here with them. I will cherish every moment that we were together. I also want to thank

Zongrui Ding, Lei Yang, Qian C!('! i Y ii i He, Yakun Hu, Ji Ir-i-i :1I Wi-,:- and Zheng

Yuan.









I am grateful to Dr. Peng Yin, my intern mentor in Thomson Corporate Research, for

her guidance and support. I would also like to thank Dr. Yunfei Zheng for his help and

support.

Finally, I owe my deepest gratitude to my parents. Without their understanding,

endless patience and constant support, I would have not been able to make my dream

come true. They bore me, raised me, taught me, supported me and loved me. To them I

dedicate this dissertation.









TABLE OF CONTENTS


page


ACKNOW LEDGMENTS .................................

LIST O F TABLES . . .

LIST OF FIGURES . . .


ABSTRACT ..............


CHAPTER


1 INTRODUCTION .....


Motivation .......
Outline .........


2 RIPPLET-I TRANSFORM


Introduction .. ............
Continuous Curvelet Transform ...
Continuous Ripplet-I Transform ...
2.3.1 Ripplets ............
2.3.2 Continuous Ripplet-I Transform
Discrete Ripplet-I Transform ......
Tight Frame .. ............
Experimental Results .. ........
2.6.1 Nonlinear Approximation .
2.6.1.1 Ripplet-I with different
2.6.1.2 Comparison with other
2.6.2 Image Compression ....


degrees
transforms


2.6.2.1 Comparison on cropped images .....
2.6.2.2 Comparison on texture-tich images .
2.6.2.3 Comparison on natural images ......
2.6.3 Image Denoising .. ................
2.7 Sum m ary . .

3 RIPPLET-II TRANSFORM .. ................

3.1 Introduction . .
3.2 Generalized Radon Transform .. ............
3.3 Ripplet-II Transform .. ................
3.3.1 Continuous Ripplet-II Transform .. ........
3.3.1.1 Forward transform .. ..........
3.3.1.2 Inverse transform .. ...........
3.3.2 Continuous Orthogonal Ripplet-II Transform .
3.3.3 Discrete Ripplet-II Transform .. .........


.............................:









3.3.4 Discrete Ripplet-II Transform with d = 2 ..... 61
3.4 Properties of Ripplet-II Transform ............ ... .. 62
3.5 Experimental Results .................. ........... .. 66
3.5.1 Texture Classification .................. .. .... .. .. 66
3.5.2 Image Retrieval .................. .......... .. 70
3.6 Summ ary .................. ............... .. .. 72

4 SPARSITY BASED DE-ARTIFACTING IN VIDEO CODING ... 73

4.1 Introduction .................. ................ .. 73
4.2 Fram work .. .. ... .. 76
4.3 Algorithm Description .................. .......... .. 76
4.3.1 Similar Patch Set Formulation ................ .. 78
4.3.2 Sparsity Enforcement .................. ....... .. 78
4.3.3 Multiple Hypothesis Fusion ................ .... .. 80
4.3.4 Application to Compression Artifact Removal ... 81
4.4 Complexity Analysis .................. ........... .. 81
4.4.1 Searching Complexity .................. ..... .. 82
4.4.2 Transform Complexity .................. ..... .. 83
4.5 Performance Comparison .................. ......... .. 83
4.5.1 2D or 3D patch .................. .......... .. 84
4.5.2 Transform Dimensionality ................ .... .. 85
4.5.3 Quantization Parameters .................. ... .. 85
4.5.4 Visual Quality .................. ........... .. 86
4.6 Summ ary .................. ............... .. .. 86

5 SPARSITY ENHANCED VIDEO CODING TECHNIQUES .. 90

5.1 Introduction .................. ................ .. 90
5.2 Adaptive Block Pattern .................. ......... .. 93
5.2.1 Heterogeneous Block Pattern ................. .. 93
5.2.2 Implementation Details .................. ..... .. 94
5.3 Enhanced Intra Prediction .................. ........ .. 94
5.3.1 Intra Similar-block Search ................ .... .. 94
5.3.2 Implementation Details .................. ..... .. 97
5.4 Experimental Results .................. ........... .. 97
5.4.1 Heterogeneous Block Pattern ................. .. 98
5.4.2 Enhanced Intra Prediction ................ .... .. 98
5.4.3 Combination of Algorithms ................ .... .. 98
5.5 Summ ary .................. ............... .. .. 99

6 SPARSITY BASED RATE CONTROL IN VIDEO CODING .... 105

6.1 Introduction .................. ................ .. 105
6.2 Gamma Rate Theory .................. ........... 108
6.2.1 Reverse Water-filling .................. .. .... 108
6.2.2 Gamma Rate Theory .................. ..... .. 110









6.2.3 Simulation Results ............... ....... .. 118
6.3 Sparsity Based Rate Control ............... .... .. 118
6.3.1 Rate Control in Video Coding ..... ... 118
6.3.2 Rate Control without Constraints on Distortion Fluctuation 121
6.3.2.1 Algorithm framework ............. .. .. 121
6.3.2.2 R-Q model ............... .... .. 125
6.3.2.3 Implementation details ............. .. .. 127
6.3.2.4 Experimental results ... 128
6.3.3 Rate Control with Constraints on Distortion Fluctuation ...... 132
6.3.3.1 Algorithm framework ............... 132
6.3.3.2 Implementation details .............. .. 136
6.3.3.3 Experimental results ........ 137
6.4 Summ ary .................. ............... .. .. 140

7 CONCLUSION .................. ................. .. 142

7.1 Summ ary .................. ............... .. .. 142
7.2 Future W ork .................. ................ .. 144

REFERENCES .................. ................ .. .. 145

BIOGRAPHICAL SKETCH ................... ... 152









LIST OF TABLES


Table page

2-1 PSNR comparison of Ripplet-I and JPEG2000 at 0.03125 bpp ... 40

2-2 Average PSNR gain of ripplet-I (c = 1, d = 3) based codec, compared to JPEG
and JPEG2000, respectively .................. .......... .. 41

3-1 Information extracted from data .................. ........ .. 66

3-2 Error rate under different transforms using feature extraction 1 ... 69

3-3 Error rate under different transforms using feature extraction 2 ... 70

3-4 Average retrieval rate under different transforms using feature extraction 1 71

3-5 Average retrieval rate under different transforms using feature extraction 2 71

4-1 Performance (PSNR gain in dB) of 2D and 3D patches in videos with different
motion characteristics. .................. .. ........ 84

5-1 Syntax added to MB header. ................... ........ 94

5-2 Syntax added to MB header. ................... ........ 97

5-3 Encoding configuration .................. ............. .. 98

6-1 Model accuracy for R D .................. ........... 125

6-2 Performance comparison of H.264 and proposed algorithm using Gaussian model 131

6-3 Performance comparison of H.264 and proposed algorithm using Laplacian model 132

6-4 Performance comparison of H.264 and proposed algorithm using C ,1. !:1 model 133









LIST OF FIGURES


Figure page

2-1 The tiling of polar frequency domain. The shadowed 'wedge' corresponds to the
frequency transform of the element function. .................. 25

2-2 Ripplet-I functions in spatial domain with different degrees and supports, which
are all located in the center, i.e., b = 0. ................ .. .. 28

2-3 The comparison of coefficients between ripplet-I transform and wavelet transform. 36

2-4 Test images .................. ................... .. 38

2-5 Comparing nonlinear approximation performance of ripplets with fixed support
and different degrees corresponding to the test images in Fig. 2-4. ... 39

2-6 Performance comparisons. .................. .. ....... 43

2-7 The visual quality comparison between ripplet-I based image codec and JPEG2000
for a patch cropped from 'barbara', when bpp is equal to 0.3 .... 44

2-8 Texture-rich images used in our experiment ................ 45

2-9 PSNR vs. bpp for ripplet-I based image codec, JPEG and JPEG2000. 46

2-10 The visual quality comparison between ripplet-I based image codec and JPEG2000
for 'mandrill', when bpp is equal to 0.25. ................ ..... 47

2-11 A scale-up details of denoised test image 'barbara'. The standard variance of
noise is 15 .................. ............... ... ..48

3-1 Curves defined by Eq. (3-3) in Cartesian coordinates. (a) d = 1. (b) d = 2. (c)
d = 3. (d) d = -1. (e) d = -2. (f) d = -3 .......... ...... .. 52

3-2 Ripplet-II functions in Cartesian coordinates (xi, x2) (a) a = 1, b = 0, d = 2
and 0 = 0 .(b)a = 2, b = 0, d = 2 and 0 = 0 .(c) a = 1, b = 0.05, d = 2 and
0= 0 .(d)a = 1, b= 0, d = 2 and 0 = 30 .................. .. 55

3-3 Ripplet-II functions in Cartesian coordinates (x, x2) (a) a = 1, b = 0, d = 1
and 0 = 0 .(b)a = 2, b = 0, d = 1 and 0 = 0 .(c) a = 1, b = 0.05, d = 1 and
0= 0 .(d)a = 1, b= 0, d = 1 and 0 = 30 .................. .. 56

3-4 Gaussian images with a curve edge. Top row: original image f(x, y); Middle
row: Magnitude of 2D Fourier transform; Bottom row: Magnitude of 2D Fourier
transform after substituting the polar coordinate (r', 0') with (V/r, 0/2). Left
column: Parabolic curve. Middle column: Curve with degree 3. Right column:
Curve with degree 4 .................. .............. .. 60

3-5 Ripplet-II with different degrees. .................. ..... 63










3-6 Comparison of coefficient decaying between wavelet, ridgelet, ripplet-II and orthogonal
ripplet-II with d = 2. .................. ... ......... 65

3-7 Textures used in texture classification. .................. .... 67

3-8 Textures rotated with different angles. .................. .... 68

4-1 Coding artifacts: (A) Edge distortion. (B) Ringing effects. (C) Texture distortion.
(D) Blocky artifacts. ............... .............. .. 74

4-2 2D and 3D patches in a video sequence. ................ ..... 77

4-3 Flowchart of the proposed framework. 2D patches are used for demonstration. .77

4-4 Patch sorting and packing. The degree of gray in patches indicates the similarity.
The lower the degree is, the more similar the path is to the reference patch. 79


4-5 Deartifacting filter as a post-processing tool in a video encoder .. ...

4-6 Deartifacting filter as an in-loop tool in a video encoder .. ......

4-7 Performance of different transform dimensions .. ............

4-8 Performance for different quantization parameters .. .........

4-9 Visual comparison of detailed crops. Left column is coded by H.264. Righ
is filtered by proposed filter ....................

5-1 Hybrid video coding diagram ...................

5-2 Partition of M B ........................

5-3 Intra prediction directions for 4x4 blocks ............

5-4 Block pattern comparison. (A) Homogeneous block pattern in H.264. (B)
heterogeneous block pattern ................ .....

5-5 Example of similar MBs. Two MBs indicated by block rectangles are very
to each other . .. ....

5-6 RD plots of H.264 and proposed algorithm without overhead .

5-7 RD plots of H.264 and proposed algorithm with overhead ...

5-8 RD plots of H.264 and proposed algorithm without overhead .

5-9 RD plots of H.264 and proposed algorithm with overhead ...

5-10 RD plots of H.264 and proposed algorithm without overhead .

5-11 RD plots of H.264 and proposed algorithm with overhead ...


t column


Proposed
. 95

similar
. 96

. 99

. 100

. 101

. 102

. 103

. 104









6-1 Data compression system diagram. .................. ..... 105

6-2 Rate-distortion function for a Gaussian source. Achievable rate-distortion region
is the gray area .................. ................. .. 106

6-3 Reverse water-filling for 7 Gaussian sources ............ .. .. 109

6-4 Controllable region in R-7D plane. .................. ... 111

6-5 F(R) function ............... .............. .. 119

6-6 The relation between R and D .................. ......... 126

6-7 Encoding process with rate control in H.264. ............ .. 128

6-8 Rate distortion comparison between H.264 and proposed algorithm with different
m odels. .................. .................. .. .. 130

6-9 Rate distortion comparisons with 7D = 0.8dB. .................. 138

6-10 R vs 7D of 'hall monitor' by in,,..I;' ,l H.264. .............. .. 139

6-11 R vs 7D of 'hall monitor' by proposed algorithm. ............... 140









Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

SPARSITY BASED IMAGE AND VIDEO PROCESSING

By

Jun Xu

August 2010

C'!I ir: Dapeng Wu
Major: Electrical and Computer Engineering

The wide usage of digital multimedia consumer electronics leads to the rapid

explosion of the amount of image and video data for -i Ii i_. storage and transmission

over networks. How to find efficient algorithms to process these data raises great

challenges. Sparsity is a useful measurement of the efficiency. In this dissertation, we

address these problems based on exploring the sparsity of data representations from data

independent and data dependent perspectives.

In the first part of this dissertation, we study image representation using data

independent approaches. For image representation, transforms can be used to convert

original data into transform coefficients to eliminate the correlation among original data.

We focus on images with singularities, which are very difficult to represent by conventional

transforms.

We propose Ripplet transform type I (Ripplet-I), which generalizes Curvelet

transform by introducing support and degree parameters. Curvelet transform is just a

special case of Ripplet-I transform with c = 1, d = 2. Ripplet-I transform can represent

images with singularities along arbitrary curves efficiently. Ripplet-I transform achieves

sparser representation than Curvelet transform and demonstrates superior performance in

applications such as image compression and image denoising.

Following the strategy of Ridgelet transform, we propose Ripplet transform type

II (Ripplet-II) based on generalized Radon transform. Ripplet-II transform maps









singularities along curves of images in the pixel domain to point singularities in the

generalized Radon domain. Point singularities are further represented sparsely by

wavelet transform. Ripplet-II transform has demonstrated better performance in image

classification than conventional transforms.

In the second part of this dissertation, we study performance improving approaches

by enhancing sparsity of image representations using the information from data. These

approaches are data dependent and vary from case to case. To remove artifacts introduced

by a video codec, we enhance the sparse representation of true signals to remove artifacts

and preserve true signals; to improve video coding efficiency, we provide better intra

prediction and adaptive block pattern to enhance the sparsity of residuals; to provide

videos with smooth quality, we propose sparsity based rate control algorithm for video

coding with constraints on distortion fluctuations.









CHAPTER 1
INTRODUCTION

1.1 Motivation

Along with the rapid development of electronic technologies, more and more digital

cameras and video recorders are used in daily life. As a consequence, tremendous

image and video data are produced ready for processing, storing and transmission.

The technology evolution also leads to the increasing resolution of images and videos,

which yields even higher volumes of data. If operating directly in the pixel domain,

challenges arise due to large amounts of data. If we can find an equivalent representation

with smaller amounts of data, the tasks will be easier. Hence, researchers are seeking more

efficient representation of image/video.

Image and video data usually have a lot of redundancies. In most cases, pixel

intensities of images are highly correlated to their neighbor pixels. For video sequences,

they are even correlated to pixels from .,li .1ient frames. The correlation encourages

researchers to explore approaches to eliminate the redundancy. From a statistical

perspective, each image or video corresponds to a point in an extremely high dimensional

space. With high probabilities, images and videos usually belong to a subspace with a

much lower dimension. Then the other dimensions used to describe the data would be

redundant and will not distort the data if discarded.

For decades, researchers have been dedicated to explore approaches that can eliminate

redundant information and provide efficient representations of data. To quantitatively

measure the efficiency of representation, we use the term -1. i-ily" [1]. Sparsity denotes

the number of non-zero items in a set. A sparse representation tends to have a large

portion of zeros. Sparse representation is equivalent to efficient representation.

The approaches in the literature fall into two categories: data independent and

data dependent approaches. The former assumes that data follow a certain model

or a combination of various models. We can represent data using the parameters of









models. If we can find the exact models, the number of nonzero parameters will be much

smaller than the dimension of data. Thus we achieve a sparse representation of the data.

Harmonic analysis [2], which studies how to represent a function as the superposition of

atom waves, represents most approaches in this category. A simple example is the Fourier

series [3], which represents a function as the superposition of sine and cosine functions.

For instance, if the task is to represent a sine function with infinite duration. Since

the sine function is periodic, only the values inside one period are enough to represent

the whole function. Still, there are lots of values just in one period. However, if it is

represented as a Fourier series, only one coefficient is enough to represent it.

A transform is the process that maps the original function to a set of numbers called

transform coefficients. Transform coding p1 i l- a very important role in audio, image

and video compression applications. The history of transform coding goes back to the

1950's [4]. Signals from a vocoder were modeled as a stochastic process and shown to be

compressible using Karhunen-Loeve transform (KLT) [5] made up of the eigenvectors of

the correlation matrix. Karhunen-Loeve transform is the best transform with the least

mean squared error in linear approximation. However, KLT is signal dependent. It is

not useful in real applications that employ a transform. It will be efficient to choose

transforms with fixed bases. Various kinds of transforms are studied for decades.

Discrete cosine transform (DCT) [6] was proposed as an approximation to KLT

with the assumption of first order Gauss-Markov process when the correlation is high

(larger than 0.9). DCT was later used as the key transform in the practical image coders.

DCT combined with scalar quantization and entropy coding was standardized by the

joint picture experts group (JPEG) in the late 1980's. The JPEG image compression

standard serves widely all over the world [7]. In the 1990's, wavelet transform attracted a

lot attention in academic [8] [9] [10] [11] [12]. Wavelet transform yields a multiresolution

representation of signals which consists of octave-band frequency decomposition. The

decomposition provides good frequency selectivity at lower frequencies and good time









selectivity at higher frequencies. This property fits the characteristics of many nature

images. Combined with enhanced quantization technique and entropy coding, discrete

wavelet transform (DWT) leads to the new image compression standard, JPEG2000 [13].

The new standard achieves better compression efficiency than JPEG.

Although DCT and DWT are successful in image compression, how to represent a

signal efficiently is still a big challenge. Smooth functions can be sparsely represented by

many transforms. However, when there are discontinuities (singularity points) in a signal,

it is not an easy job to find a good representation for the signal. Singularities will yield

infinite number of Fourier series terms or infinite number of nonzero Fourier transform

coefficients [3] [14] [15]. If we reconstruct the signal with finite number of terms, Gibbs

phenomena appears around singularity points [16] [17]. Only when all terms are used,

Gibbs phenomena can be removed from the reconstructed signal. Wavelet transform

solved this problem by introducing multiscale analysis [18]. The singularity points can

be captured by changing the scale and position of wavelet functions. Wavelet transform

can resolve one dimensional singularities. When the dimension of signals increases, higher

dimensional transforms usually consist of tensor production of low dimensional transforms.

However, simple tensor production no longer resolves high dimensional singularities.

Ridgelet transform [19] [20], Curvelet transform [21] [22] and Shearlet transform [23] [24]

were introduced to solve singularities along straight lines and C2 curves. Nevertheless,

the solution for a sparse representation for images with singularities along arbitrary

curves is still an open issue. In the first part of this dissertation, we propose two data

independent transforms named ripplet '", f. ,rm 'qj'' I and II. Ripplet transforms aim

to resolve two dimensional singularities in images from different perspectives. The new

transforms are able to obtain sparse representations of images with singularities. These

sparse representations can assist further processing of image data in various applications

such as image denoising, image compressing and image classification.









The problem with approaches in the first category is that the atom functions do not

have much prior knowledge about the data. The efficiency of a representation depends

on how well the transform fits the data. If the data follow exactly the same function with

atom functions, like Fourier series and sine function, the representation will achieve a

sparsity as small as 1. If not, transform will provide no improvement in representing the

data. For instance, a step function needs infinite terms in its Fourier series. To address the

problem, we can form a dictionary including various transforms designed for models ever

found in all kinds of data. Then we search in the set of different transforms until the most

sparse one is found. The problem is that we can not guarantee the dictionary includes all

the models in the real world. Even if we have one, it takes a lot of resources to store the

dictionary and computation to search for the optimal transform.

In the second category, researchers find that if we do not care about the exact type

of model for the data, we can still succeed in solving a lot of problems. In fact, we can

make advantage of some useful observations that there are a lot of redundancies inside

image and video data. Image data have plenty of spatial redundancies. Video sequences

are highly correlated along time in addition to spatial redundancy. Based on the high

correlation, we can use data themselves to assist data processing. For example, in video

coding using part of the data to serve as a prediction or reference, most of redundancies

can be removed. Compared to processing the data directly, it is usually much easier

and more efficient to compress the prediction errors and prediction modes. The actual

algorithms vary a lot according to applications.

We can further classify approaches into two categories based on the range of

correlation: local and non-local. Local means that only data in a local neighborhood will

be used. The assumption behind is that the correlation decreases as the distance increases

and the correlation can be considered as zero if the distance is large enough, which is

true in most data. This category includes Linear Predictive Coding (LPC) [25], Warped

Linear Predictive Coding (WLPC) [26] [27], Differential Pulse-code modulation (DPC'\ )









[28], adaptive DPC'\ 1 [29] in speech processing, autoregressive moving average (ARMA)

model [30] in statistics, and local overcomplete representation [31] for image processing.

Non-local stands for approaches searching for redundancies over the whole data space.

Sometimes the assumption that local neighbors are useful for processing is violated. To

improve the performance, a certain criterion on correlation or similarity is introduced to

exclude useless data and enforce the similarity. Intra prediction and motion compensation

in video coding [32] [33] use the mean squared error to find the best match, which will

improve the compression efficiency. Block-Matching and 3D filtering (BM3D) [34] is the

state-of-the-art image denoising algorithm. BM3D separates true signal and noise by

enforcing sparse representation of similar blocks from all over an image. Variations [35]

[36] based on BM3D also achieved superior performance in several applications in image

and video processing. These algorithms all enjoy the benefit of exploring the sparsity of

signals.

In the second part of this dissertation, we address several practical problems related

to video coding. We proposed sparsity based algorithms to improve the coding efficiency of

modern video coding systems. Sparse properties in video coding systems are explored and

used wisely to serve for different aspects in video coding systems.

1.2 Outline

A detailed outline of this dissertation is presented as follows.

C'! lpter 2 proposes a new transform called Ripplet transform type I by generalizing

Curvelet transform. We introduce a support c and degree d in addition to scale, location

and direction parameters in the definition, while the curvelet transform is just a special

case of the ripplet transform type I with c = 1 and d = 2. The new transform can adapt

to various singularities along arbitrary Cd curves. The flexibility enables the capability

of yielding a more efficient representation for images with singularities. In particular, we

develop forward and backward Ripplet transform type I for both continuous and discrete

cases. To evaluate the performance of proposed transform, Ripplet transform type I is









applied to image compression and image denoising. Our experimental results demonstrate

that the ripplet transform can provide efficient representation of edges in images.

In C'!i lpter 3, we introduce Ripplet transform type II based on generalized Radon

transform. The new transform converts 2D singularities into 1D singularities through

generalized Radon transform. Then 1D wavelet transform is used to resolve the 1D

singularities. Both continuous and discrete Ripplet transform type II are defined.

Orthogonal ripplet transform type II is introduced to further improve sparsity of the

representation. Properties of the new transform are explored. The rotation invariant

property achieves good performance in image classification and retrieval for texture

images.

('!i lpter 2 and C'!i lpter 3 present transform based approaches to achieve sparse

representations. In certain applications, direct transform may not yield a sparse enough

representation of data. Non-transform based preprocessing techniques can be used,

since those techniques themselves can provide sparse representations. In the following

chapters, we will present several sparse based techniques applied in video processing and

compression.

C'!i lpter 4 presents approaches that learn from data. In particular, we propose

a general framework that explores self similarities inside signals to enhance sparse

representations of true signals. The framework unifies three sparsity based denoising

techniques and applies them for video compression artifact removal problem. We

compare and analyze the three techniques from the aspects of operation atom, transform

dimensionality, and quantization impact. The comparison and analysis can serve as

a guideline of applying sparsity based denoising techniques to related problems. The

framework introduces several freedom in each component, which can adapts to various

applications as long as sparsity based approach can solve the problem.

In C'!i lpter 5, we present several techniques to improve the coding efficiency of video

codec through sparsity enhancement. We propose more accurate prediction through









adaptive block pattern and sophisticated prediction models for intra coded frames.

These techniques can improve the video coding performance by improving the sparsity

of prediction residuals. The algorithms are implemented in the hybrid video coding

framework in H.264. Experimental results demonstrate gains over H.264 by proposed

algorithms.

In ('!i lpter 6, we explore the rate-distortion behaviors when introducing new

constraints on distortion fluctuations. Gamma rate theory is proposed to provide guidance

for designing rate control schemes. There is a tradeoff between rate and distortion

fluctuation. Later, a sparsity based rate control algorithm is proposed to improve the

coding efficiency with limitations on bit rate. Various rate-distortion models are employ, -l

to derive the optimal bit allocation for cases with and without constraints on distortion

fluctuations. In the experiments, we evaluate the proposed algorithm by (R, D, y7) based

on the gamma rate theory. Experimental results demonstrate that there is indeed a

tradeoff in practical rate control schemes.

('!i lpter 7 summarizes the whole dissertation.









CHAPTER 2
RIPPLET-I TRANSFORM

2.1 Introduction

Efficient representation of images or signals is critical for image processing, computer

vision, pattern recognition, and image compression. Harmonic analysis [2] provides a

methodology to represent signals efficiently. Specifically, harmonic analysis is intended

to efficiently represent a signal by a weighted sum of basis functions; here the weights

are called coefficients, and the mapping from the input signal to the coefficients is

called transform. In image processing, Fourier transform is usually used. However,

Fourier transform can only provide an efficient representation for smooth images but

not for images that contain edges. Edges or boundaries of objects cause discontinuities

or singularities in image intensity. How to efficiently represent singularities in images

poses a great challenge to harmonic analysis. It is well known that one-dimensional (ID)

singularities in a function (which has finite duration or is periodic) destroy the sparsity

of Fourier series representation of the function, which is known as Gibbs phenomenon. In

contrast, wavelet transform is able to efficiently represent a function with 1D singularities

[11, 12]. However, typical wavelet transform is unable to resolve two-dimensional (2D)

singularities along arbitrarily shaped curves since typical 2D wavelet transform is just a

tensor product of two 1D wavelet transforms, which resolve 1D horizontal and vertical

singularities, respectively.

To overcome the limitation of wavelet, ridgelet transform [19, 37] was introduced.

Ridgelet transform can resolve 1D singularities along an arbitrary direction (including

horizontal and vertical direction). Ridgelet transform provides information about

orientation of linear edges in images since it is based on Radon transform [38], which

is capable of extracting lines of arbitrary orientation.

Since ridgelet transform is not able to resolve 2D singularities, Candes and Donoho

proposed the first generation curvelet transform based on multi-scale ridgelet [39, 40].









Later, they proposed the second generation curvelet transform [41, 42]. Curvelet

transform can resolve 2D singularities along smooth curves. Curvelet transform uses

a parabolic scaling law to achieve anisotropic directionality. From the perspective of

microlocal analysis, the anisotropic property of curvelet transform guarantees resolving

2D singularities along C2 curves [21, 41-43]. Similar to curvelet, contourlet [44, 45] and

bandlet [46] were proposed to resolve 2D singularities.

However, it is not clear why parabolic scaling was chosen for curvelet to achieve

anisotropic directionality. Regarding this, we have two questions: Is the parabolic scaling

law optimal for all types of boundaries? If not, what scaling law will be optimal? To

address these two questions, we intend to generalize the scaling law, which results in a

new transform called ripplet transform Type I. In the rest of this chapter, we use Ripplet-I

in short. Ripplet-I transform generalizes curvelet transform by adding two parameters,

i.e., support c and degree d; hence, curvelet transform is just a special case of ripplet-I

transform with c = 1 and d = 2. The new parameters, i.e., support c and degree d,

provide ripplet-I transform with anisotropy capability of representing singularities along

arbitrarily shaped curves. The ripplet-I transform has the following capabilities.

* Multi-resolution : Ripplet-I transform provides a hierarchical representation of
images. It can successively approximate images from coarse to fine resolutions.

Good localization: Ripplet-I functions have compact support in frequency domain
and decay very fast in spatial domain. So ripplet-I functions are well localized in
both spatial and frequency domains.

High directionality: Ripplet-I functions orient at various directions. With the
increasing of resolution, ripplet-I functions can obtain more directions.

General scaling and support: Ripplet-I functions can represent scaling with
arbitrary degree and support.

Anisotropy: The general scaling and support result in anisotropy of ripplet-I
functions, which guarantees to capture singularities along various curves.

Fast coefficient decay: The magnitudes of ripplet transform coefficients decay
faster than those of other transforms, which means higher energy concentration
ability.









To evaluate the performance of ripplet-I transform for image processing, we conduct

experiments on synthetic and natural images in image compression and denoising

applications. Our experimental results demonstrate that for some images, ripplet-I

transform can represent images more efficiently than DCT and discrete wavelet transform

(DWT), when the compression ratio is high. When used for image compression, ripplet-I

transform based image coding outperforms JPEG for the whole bit rate range; and it

achieves performance comparable to JPEG2000, when the compression ratio is high; but

ripplet transform can provide better visual quality than JPEG2000. Our experimental

results also show that the ripplet-I transform achieves superior performance in image

denoising.

The remainder of this chapter is organized as below. In Section 2.2, we review the

continuous curvelet transform in spatial domain and frequency domain, and analyze the

relations between them. In Section 2.3, we generalize the scaling law of curvelet to define

ripplets and introduce continuous ripplet-I transform and inverse continuous ripplet-I

transform. Then we discuss the discretization of ripplet-I transform in Section 2.4. We

analyze ripplet functions from the perspective of frames in Section 2.5. Section 2.6

presents experimental results that demonstrate the good properties of ripplets.

2.2 Continuous Curvelet Transform

Similar to the definition of wavelets, the whole curvelet family is constructed based

on the element curvelet functions. The element curvelet functions vary from coarse to fine

scales. The curvelet functions are translated and rotated versions of the element functions.

The 2D curvelet function is defined as below [39], [40]


,Yae() = ao(Ro( )), (2-1)

cos 0 sin 0
where Ro = is the rotation matrix, which rotates 0 radians. x and b are
sin cosO
2D vectors. 7a6o is the element curvelet function.









The element curvelet function with scale parameter a is defined in the frequency

domain in polar coordinates [40].


a(r, u) = a3/4 W(a r)V(w/ a),


where 7a(r, w) is the Fourier transform of 7a0o in polar coordinate system. W(r) is a

'radial window' and V(w) is an 'angular window'. These two windows have compact

supports on [1/2, 2] and [-1, 1], respectively. They satisfy the following admissibility

conditions

2 W2 (r) = 1,
1/2 (

f V2(t)dt = 1.


(2-2)


(2-3)


(2-4)


These two windows partition the polar frequency domain into 'wedges' shown in

Fig. 2-1.



















Figure 2-1. The tiling of polar frequency domain. The shadowed 'wedge' corresponds to
the frequency transform of the element function.


From Eq.(2-2), (2-3) and (2-4), we know that the Fourier transform of curvelet

function has a compact support in a small region which is the Cartesian product of









r c [a, '] and uw E [- V]. Curvelet also has small effective regions and decays

rapidly in spatial domain. Compared to wavelets, in addition to the scaling information

and position information, curvelet functions have another parameter to represent

directional information. An intuitive way to obtain direction information is using rotated

wavelet. However, the isotropic property of rotated wavelet transform makes the rotation

unsuitable for resolving the wavefront set [41], [42]. The parabolic scaling used in the

definition of curvelet functions guarantees the effective length and width of the region

to satisfy: width w length2 and leads to anisotropic behavior of curvelets, which makes

curvelet transform suitable for resolving arbitrary wavefront. The parabolic scaling is the

most important property of curvelet transform and also the key difference between the

curvelet and the rotated wavelet.

Given a 2D integrable function f(x), the continuous curvelet transform is defined as

the inner product of f(x) and the curvelet function [41], [42], [22]

C(a, b, 0) = (f, 7,) = f()x)y (x)dx, (2-5)

where C(a, b, 0) are the curvelet coefficients and (.) denotes the conjugate operator. The

curvelet coefficients describe the characteristics of signal at various scales, locations and

directions.

In fact, the curvelet transform only captures the characteristics of high frequency

components of f(x), since the scale parameter a can not take the value of infinity. So

the 'full' continuous curvelet transform consists of fine-scale curvelet transform and

coarse-scale isotropic wavelet transform. The 'full' curvelet transform is invertible. We

can perfectly reconstruct the input function based on its curvelet coefficients. With the

'full' curvelet transform, the Parseval formula holds [41], [42], [22]. If f(x) is a high-pass

function, it can be reconstructed from the coefficients obtained from Eq.(2-5) through

f(2) = C(a, b, 0)7fo()dadbdO/a (2 6)









and

||Ilf 2 C(a, 0) 2dadbdO/a3 (2-7)

2.3 Continuous Ripplet-I Transform

In this section, we introduce ripplet-I functions and continuous ripplet-I transform.

We first generalize curvelet functions to define ripplet-I functions and then present the

definition of continuous ripplet-I transform.

2.3.1 Ripplets

From the review in Section 2.2, we know that parabolic scaling used in curvelets leads

to resolving of 2D singularities. However, there is no evidence to show that the parabolic

scaling is the optimal scaling law. We can define the scaling law in a more broader scope

and more flexible way. The ripplet-I function can be generated following the same strategy

in Eq. (2-1)

PaeO(x = Pao(RO(x- b)), (2-8)

cos 0 sin 0
where paGo() is the ripplet-I element function and Re = is the rotation
sinO cosO
matrix. We define the element function of ripplet-I in frequency domain as

1 1+d ad
a(r, L,) = a d W(a r) V( w), (2-9)
c-a

where pa(r, a) are the Fourier transform of pgao(x). W(r) is the 'radial window' on

[1/2, 2] and V(w) is the 'angular window' on [-1, 1]. They also obey the admissibility

conditions (2-3) and (2-4).

The set of functions {paeO} is defined as ripplet-I functions or ripplets for short,

because in spatial domain these functions have ripple-like shapes. c determines the

support of ripplets and d is defined as the degree of ripplets. Curvelet is just the special

case of ripplet-I for c = 1, d = 2. Fig. 2-2 shows ripplets with different c and different d in

spatial domain. From Fig. 2-2, we can see that ripplet-I functions decay very fast outside

the effective region, which is an ellipse with the ii i, r axis pointing in the direction of




































A a = 3, 0 = 37/16, c 1. d = 2 (curvelet)


C a 4, 0 = 37/16, c 1, d 4


D a 4, 0 = 37/16, c 1.5, d 4


Figure 2-2. Ripplet-I functions in spatial domain with different degrees
which are all located in the center, i.e., b = 0.


and supports,


B a = 3. 0 = 37x/16, c = 1.5. d = 2









the ripplet. The 1 i i, jr axis is defined as the effective length and the minor axis, which is

orthogonal to the 1 i i' ', axis, is the effective width. The values of c and d will actually

affect the effective length and width of ripplets in spatial domain. The effective region has

the following properties for its length and width: width w c x length. For fixed d, the

larger c is, the shorter the width is and the longer the length is. When c is fixed and d

gets larger, the width gets shorter and the length is elongated. The customizable effective

region tuned by support c and degree d bespeaks the most distinctive property of ripplets

-the general scaling. For c = 1, d = 1, both axis directions are scaled in the same way.

So ripplet-I with d = 1 will not have the anisotropic behavior. For d > 1, the anisotropic

property is reserved for ripplet transform. For d = 2, ripplets have parabolic li:-. For

d = 3, ripplets have cubic scaling; and so forth. Therefore, the anisotropy provides ripplets

the capability of capturing singularities along arbitrary curves.

The ripplets as the generalization of curvelet have almost all the properties of curvelet

except the parabolic scaling. Ripplet-I transform can get multi-resolution analysis of data.

For each scale, ripplets have different compact supports such that ripplets can localize the

singularities more accurately. Ripplet-I functions are also highly directional to capture the

orientations of singularities.

2.3.2 Continuous Ripplet-I Transform

For a 2D integrable function f(x), the continuous ripplet transform is defined as the

inner product of f(x) and ripplets


R(a, b, 0) = (f, pa= ( f)Pao(W)dx, (210)


where R(a, b, 0) are the ripplet-I coefficients. When the ripplet-I function intersects

with curves in images, the corresponding coefficients will have large magnitude, and the

coefficients decay rapidly along the direction of singularity as a -- 0.

The ripplet-I transform defined in Eq. (3-15) has the same issues as curvelet

transform does, which is that the continuous ripplet-I transform can only capture the









behavior of f(x) in high frequency bands. To establish the 'full' continuous ripplet

transform, we need to apply isotropic wavelet transform to represent the low frequency

information. However, what really matters is the behavior of the transform in the high

frequency bands, where the difference between curvelet and ripplet-I lies.

Now we transform images into another domain that we call ripplet domain. The

challenges arise when we try to reconstruct images from ripplet-I coefficients. The

theorems below introduce the inverse ripplet-I transform.

Theorem 1. Let f E L2 be a high-pass function,which means that its Fourier tr ,.-1f.',,

vanishes for wu| < 2/ao and ao is a constant. f can be reproduced by its ripplet-I I,,-,,.frm

through

f(=) = R(a, b, O)paO()dadbdo/a3, (2- 11)

And a Parseval formula for f holds


If \1= R(a, b, ) 12dadbdO/a3, (2-12)

Proof. For r > 2/ao, (2-3) can be rewritten as
/2 da)2 [o' da_- [ao da
SW(a)2 W(2da W(a r)2 1 (2-13)
d /2 a Ja a

Based on the admissibility condition Eq.(2-3), we have

J d dl- (
( ))2d = cad (2-14)
Jo ca

For a special ripplet-I pado(x), its Fourier transform has the property as below.

ao 2Jo da
a, / IPaoo (0 a 12 do d2
3a
ao j27, a
ao 2 V(a d 0))2 1 +d da
W(a r)2V( ( ))2-a- d do3
o 0 ca c a3
1 (2-15)








Define
ga,o(x) = J(paO' f)PabOdb (2-16)

We have Pa6o = Pao(x b), so

gao(x) pPao(x b) (Pao6(x b)(y b)f(y)dy) db

= JPa (x b)( f ) (b)db
= (Pao0 Pao f)(x) (2-17)

According to the property of convolution, we can obtain the Fourier transform of g as

ga,o(w) = PaO Paf(w ) = Paoo0() 2f() (2-18)

Using (2-15), we get

S aoa()dO = f ao() 12()dae; = ( f ) (2-19)

Further,

f(x) = if(w)eJxd

= aO()dO-X dwxdw

= 9 ga,(x)dO

(Pao' f)Pao a (220)

Using Plancherel formula and Eq.(2-15), we have

J R(a, b, 0) 2dadbdO/a3 = (po* f)(b) 2daddO/a3

= | ( \)1 2 12( 2dO d a
a3
= 1 (U) 2dw

= f112 (2-21)











Theorem 2. Let f E L2. There is a bandlimited purely radial function 0 in L2 and of
rapid /...' so that, if (ao,b(X) = (X- b),

f(x) = J (0 ,b f )o,bdb+ o'0 / J (f, Pa})Pa(x)dadbd0/a3 (2-22)

and

2= J I(ao,b, f)2db+ o j (f, ) 2dadsdO/a3 (2-23)

Since the issue of interest is just the fine-scale elements or high frequency bands,
the choice of the wavelet transform for the coarse-scale can be very flexible. Similarly,
Theorem 2 can be easily proved using the same arguments in [42].
2.4 Discrete Ripplet-I Transform

In the previous section, we introduced ripplets and continuous ripplet-I transform.
Digital image processing needs discrete transforms instead of continuous transforms.
Discretization of ripplet transform is proposed and analyzed in this section.
The discretization of continuous ripplet-I transform is actually based on the
discretization of the parameters of ripplets. For the scale parameter a, we sample at
dyadic intervals. The position parameter b and rotation parameter 0 are sampled at
equalspaced intervals. a, b and 0 are substituted with discrete parameters ay, bk and 0/,
which satisfy that aj = 2- Ek = [c 2 kt, 2-j/d k2] T and 0i = 2T .2-(1-1/d)j .
where k = [ki, k2]T, ()Tdenotes the transpose of a vector and j, ki, k2, I ZE The degree
of ripplets can take value from R. Since any real number can be approximated by rational
numbers, we can represent d with d = n/m, n, m 0 E Z. Usually, we prefer n, m E N
and n, m are both primes. In the frequency domain, the corresponding frequency response
of ripplet-I function is in the form

1 a +n 1 2 u- ). (2 -24)
pj(r, w) = a 2n W(2 -' r)V( 1 2-U n (2-24)
V/c c









The 'wedge' corresponding to the ripplet-I function in the frequency domain is


H,i(r, 8) = {2 < Irl < 22j, 1 -7 2-(1-1/d)] /I < 72-J}. (2-25)
c 2

The discrete ripplet-I transform of an M x N image f(ni, n2) will be in the form of
M-1 N-l
RJ,,i = E f(ni, n2)Pjk(ki, n2), (2-26)
ni=0 n2=0

where Rjk,k are the ripplet-I coefficients.

The image can be reconstructed through inverse discrete ripplet transform


f(ni, n2) =' 5 RJkIPJ,(n' n2). (2-27)
j

2.5 Tight Frame

From the point of view of frame, ripplets provide a new tight frame with sparse

representations for images with discontinuities along Cd curves. Before we start the proof

of tight frame, we introduce a lemma to facilitate the proof of the tight frame.

Lemma 3. Suppose that {0} C L2(R2) is a bandlimited function with


supp(0) C [-7rA, 7A] x [-TrB, B].


Suppose that g E L2(R2) is 1. ,'. / in the f..ji'. ,',. /' domain by


(p) = $(p) 12( p)

where g(w), ('), and f(w) are the Fourier i' ,,,uf.rm of g, 0 and f, 1, .:. /,;;

If we have a set of functions {Qk(X) = 0(xi ki/A, x2 k2/B)}, then we have


g(x)= (f, 0)k)Ok(X)
k
and

Igl 2 = (f, k)12
k









Proof. Based on definition, we have ck = Q(x k). Since the Fourier transform of g is the
multiplication of 0 and f, we have

g(x) = 0 *0 f(x)

= Z0(x k) (* f)(k)
k
= ckk(x) J((x k)f(x)dx
k
= (f, Ok)0k(x) (2 28)
k



Ig12 = g(x)g(x)dx

= / (<, )k)(k(X) k k

= E l(f, Ok) 2 k(X)k(x)dx
k k

= I (f, ck)12 S/ k( ) 2d (2-29)
k k
From the definition of ripplets, all translated version of element ripplet-I will cover all
the bands. Then we have Ek f lIk(W) 2dw = 1. So


g = ~ (f, Ok)2 (2-30)
k



Theorem 4. Ripplet-I functions provide a tight frame given ,:, L2 function f.


lf| .2 = IR(j, k, /)2 (2 31)
j,k,I

The theorem can be proved with the translation parameter b and I = 0, based on the
lemma above. For arbitrary /, we can rotate the coordinate to get a / = 0, where Lemma 3

applies.









2.6 Experimental Results

In this section we present experimental results that demonstrate properties of ripplet-I

transform and its potential applications.

2.6.1 Nonlinear Approximation

To quantify the performance of sparse representation of transforms, nonlinear

approximation (NLA) [2] of images is adopted as a common comparison approach.

Suppose we have orthonormal basis {Qk} and the corresponding coefficients Ck =< g, Qk >.

These coefficients are sorted in descending order with respect to the magnitude. The index

k is defined by


I co > Icll > IC2 _> > Ck >... > IC,-1 > I > ...


The nonlinear approximation is obtained using n-largest coefficients as below
n-1
g = Z c4 i. (2-32)
i=o

Since ripplet-I transform provides a tight frame, the concentration of ripplet-I

coefficients will lead to more accurate approximation in NLA. The faster the coefficients

decay, the more compact energy will be allocated to the fewer large coefficients. To

demonstrate the decay rate of ripplet-I transform coefficients, we first sort the ripplet-I

coefficients with respect to their magnitudes and compare them to sorted wavelet

coefficients in Fig. 2-3. It sil.-- -I that the coefficients of ripplet-I transform decay

faster than those of wavelet transform.

We use peak signal-to-noise ratio (PSNR) versus number of retained coefficients to

measure the quality of reconstructed images. PSNR is defined as

f2
PSNR = 10 x loglo( ax), (2-33)
mse
































101 102 103
Index of coefficients

A Lena


Ripplet
Wavelet

















104 105






Ripplet
Wavelet


100 101 102 103 104 105
Index of coefficients

B Barbara


Figure 2-3. The comparison of coefficients between ripplet-I transform and wavelet
transform.



where fmax is the maximum value of image intensities and mse is the mean square error

between the reconstructed image fMXN and original one fMXN

M-1 N-1
mse = M I(ni, n2) (n1, n2) 2 (2-34)
nt=0 n2=0









2.6.1.1 Ripplet-I with different degrees

The images we used in the experiments are synthetic images with different edges

which exhibit various 2D singularities along different curves shown in Fig. 2-4. Multiple

lines and curves are synthesized with different coordinates to provide singularities along

different curves. The truncated Gaussian image(Fig. 2-4A) presents a smooth changing

part as well as singularity introduced by truncating.

The performance comparison between ripplets with fixed support but different

degrees is shown in Fig. 2-5. To achieve the same PSNR, high degree ripplet-I needs fewer

coefficients than low degree ripplet. There is a big performance gap between degree 1

ripplet-I and others. For the same number of coefficients, ripplet with degree 1 achieves

almost 2 dB lower than others in PSNR. In other words, the degree 1 ripplet-I needs more

coefficients to achieve the same PSNR as other high degree ripplets. Degree 1 ripplet-I

has isotropic behavior and is not directionally sensitive, whereas the other ripplets are

anisotropic and can capture the singularities along curves in the test images. The gap

between performance curves shows that the anisotropy helps a lot in representing 2D

singularities efficiently. Ripplet-I transforms with degree 4 and degree 3 achieve the same

highest PSNR for the same number of coefficients. High degree ripplet-I has more compact

support and more directional sensitivity, which can capture more accurate information

about singularities. In our experiments, when d > 3, the performance is the same with

d = 3. Since the discrete implementation of ripplet-I is based on the power of 2, the

difference in performance brought by degree d only appears in fine scales. The higher the

degree is, the finer the scale is. Usually, for normal image size such as 256 x 256, d = 3 is

the highest degree used in our experiments.

2.6.1.2 Comparison with other transforms

To make comparison among different transforms, we present the results of nonlinear

approximation using discrete wavelet transform, discrete cosine transform and discrete




























A Truncated Gaussian image


C Parabolic curves D Cubic curves

Figure 2-4. Test images.


ripplet-I transform. The wavelet used in DWT is '9-7' biorthogonal wavelet [9], [47]. The

discrete ripplet-I transform uses ripplet-I with c = 1 and d = 3.

The results in Fig. 2-6A show that ripplet-I can achieve the highest PSNR when the

number of retained coefficients is less than 5500. Meanwhile, ripplet-I can provide better

visual quality than DWT and DCT. We can see that ripplet-I avoids the 'ringing' artifacts


B Multiple lines
















54 44
42
52- 8
40 -
50 38

Z 48 Z 36
C" // C
34
32

d=1 .d= 1
d=2 d=2
d=3 d=3
d=4 I d=4
40r--- 26 ---
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Number of retained coefficients Number of retained coefficients

A Truncated Gaussian function image B Multiple lines


46 46

44 44

42 42,
42-
D 40
40
S338
S36) u) 36
36
34
34 32
^ d= 1 3d= 1
d=2 / d=2
d=3 d=3
d=4 d=4
30 28f
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
Number of retained coefficients Number of retained coefficients

C Parabolic curves D Cubic curves


Figure 2-5. Comparing nonlinear approximation performance of ripplets with fixed

support and different degrees corresponding to the test images in Fig. 2-4.



of wavelet as shown in Fig. 2-6C and blocky artifacts of DCT as shown in Fig. 2-6D.


However, when using more coefficients, ripplet-I will no longer be the best. Therefore,


ripplets have the strong capability of representing the structure of images with fewer


number of coefficients.


2.6.2 Image Compression


Inspired by the sparse representation of ripplet-I transform for images, we applied


ripplet-I transform to image compression. For image compression application, we simply


replaced the transform in a typical image coding scheme. The image compression codec we









implemented consists of ripplet-I transform, quantization of ripplet coefficients, coefficient

coding and entropy coding. In this implementation, uniform scalar quantizer was adopted.

We emplo,. .1 the EBCOT [48] used in JPEG2000 [13] to code the coefficients, in which an

adaptive binary arithmetic coder is used for entropy coding [49].

We compared the performance of ripplet, JPEG and JPEG 2000, for three cases,

namely, the cropped image from 'barbara', the texture-rich images in the USC database

[50] and natural images.

2.6.2.1 Comparison on cropped images

In this experiment, we conduct comparisons on a cropped patch with rich texture

from test image 'barbara'. The result shows that the ripplet-I based codec achieves 3 dB

higher PSNR than JPEG2000 for the same bit-rate. Ripplet-I with c = 1, d = 3 can

achieve higher PSNR than Curvelet (ripplet-I with c = 1, d = 2) for the same bit-rate.In

Fig. 2-7, we compare the subjective quality of the ripplet-I based codec and JPEG2000.

It is obvious that the ripplet based codec preserves more details of texture, compared to

JPEG2000.

2.6.2.2 Comparison on texture-tich images

We also tested the ripplet-I transform on texture-rich images given in Fig. 2-8. As

shown in Table 2-1, the ripplet-I based codec achieves a slightly higher PSNR at low bit

rate, compared to JPEG2000.

Table 2-1. PSNR comparison of Ripplet-I and JPEG2000 at 0.03125 bpp
Texture images PSNR of Ripplet-I (dB) PSNR of JPEG 2000(dB)
(a) 14.79 14.76
(b) 11.58 11.45
(c) 11.72 11.59
(d) 20.82 20.72
(e) 20.99 20.96


2.6.2.3 Comparison on natural images

We compared the performance of ripplet-I transform on natural images. In this

simulation, we used ripplet-I with c = 1, d = 3 and c = 1, d = 2 (e.g. curvelet). The









results shown in Fig. 2-9 and Table 2-2 indicate that the ripplet-I based codec outperforms

JPEG by up to 3.3 dB on average at the same bit-rate. The ripplet-I with degree 3

outperforms curvelet (degree 2) as shown in Fig. 2-9A, 2-9B. In Fig. 2-9C, ripplet-I with

degree 3 can achieve similar PSNR as curvelet does, especially in low bit-rate. Compared

to JPEG2000, the ripplet-I based codec achieves about 1 dB lower PSNR on average at

the same bit-rate. However, the ripplet-I based codec can provide better subjective quality

as shown in Fig. 2-10. When compression ratio is high, there are a lot of white spots

around the face in the image coded by JPEG2000 in Fig. 2-10B, while no obvious artifacts

appear in the image coded using ripplet-I transform in Fig. 2-10A. Moreover, the ripplet-I

based codec keeps more details around the beard in 'mandrill' than JPEG2000 does.

Table 2-2. Average PSNR gain of ripplet-I (c = 1, d = 3) based codec, compared to JPEG
and JPEG2000, respectively
Barbara Mandrill Tiffany
Average PSNR gain (dB) over JPEG 2.9 1.2 3.3
Average PSNR gain (dB) over JPEG2000 -1.3 -0.8 -1.6


2.6.3 Image Denoising

The proposed ripplet-I transform can be applied as a new method for noise removal in

signals and images. Suppose an image f(ni, n2) is corrupted by the additive noise,


g(ni, n2) = f(nl, n2) + n(n1, n2) (2-35)


where n(nl, n2) are independent, identically distributed Gaussian random variables with

zero mean and variance a2.

Image denoising algorithms vary from simple thresholding to complicate model based

methods. Since ripplet-I transform provides a sparse representation of images, simple hard

thresholding in ripplet transform domain can remove most of noise. In our experiments,

we use the following hard thresholding scheme: in the transform domain, a coefficient

whose magnitude is smaller than the pre-determined threshold is set to zero; otherwise,

the coefficient is unchanged. Then we reconstruct the image by inverse transform. In the









experiments of this chapter, we search over a selected range for the optimal threshold

that provides the highest PSNR. To ensure a fair comparison, we apply the same optimal

thresholding searching strategy to other transforms to be compared with.

As shown in Fig. 2-11, the ripplet-I transform can restore the edges better than DWT.

The reason is that ripplet transform can represent these edges very sparsely, whereas noise

will have small values in all ripplet-I coefficients. Then hard thresholding can remove the

noise with little damage to the images. On the other hand, wavelet transform can not

represent edges well; so edges are blurred due to hard thresholding.

2.7 Summary

To represent curves more efficiently in images, we generalized the curvelet transform

and introduced the new ripplet-I transform. The main contributions of this chapter are

* Ripplet-I functions form a new tight frame for functional space. The ripplets have
good localization in both spatial and frequency domain. The new transform provides
a more efficient representation.

The highly directional ripplets have general scaling with arbitrary degree and
support, which can capture 2D singularities along different curves in any directions.

Experimental results indicated that ripplet-I transform can provide a more efficient

representation of images with singularities along smooth curves. Using a few coefficients,

ripplet-I can outperform DCT and wavelet transform in nonlinear approximation. It is

promising to combine ripplets and other transforms such as DCT to represent the entire

image, which contains object boundaries and textures. Ripplet-I transform is used to

represent the structure and texture, while DCT is used to compress smooth parts. The

sparse representation of ripplet-I transform also demonstrates potential success in image

denoising.
























A Performance comparison of NLA
using different transforms: DCT,
DWT and discrete ripplet-I transform


2A.


-4$


a~t


jI&"
4
A



r


B Original image


C Ripplet-I based NLA with 5000 largest
coefficients, PSNR = 31.13 dB.


a0h.


,'4L


I


Ik

- A,


D Wavelet based NLA with 5000 largest E DCT based NLA with 5000 largest coeffi-
coefficients, PSNR = 30.13 dB. clients, PSNR = 29.90 dB.

Figure 2-6. Performance comparisons.


t'
C, El?~

/ v


S4



































B Ripplet-I c = 1, d = 3, PSNR 25.39 dB


C Curvelet (Ripplet-I c
24.12 dB


1, d = 2), PSNR


D JPEG2000, PSNR 22.37 dB


Figure 2-7. The visual quality comparison between ripplet-I based image codec and
JPEG2000 for a patch cropped from 'barbara', when bpp is equal to 0.3.


A original crop





































A B


D E

Figure 2-8. Texture-rich images used in our experiment
























































A Barbara


-,d,1




28
B




26



24 -
//






22



20 <



18
S02 04 06 08 1 2
bpp


B Mandrill


tiffany
42


40


38


36


34

*/



30


28 /


26 -


24
0 02 04 06 08 1 2
bpp


C Tiffany




Figure 2-9. PSNR vs. bpp for ripplet-I based image codec, JPEG and JPEG2000.



































B Ripplet-I, PSNR = 22.76 dB


C JPEG2000, PSNR 23.18 dB


Figure 2-10. The visual quality comparison between ripplet-I based image codec and
JPEG2000 for 'mandrill', when bpp is equal to 0.25.


A Original image




































B noisy image, PSNR = 24.61 dB


C Ripplet-I transform, PSNR = 27.55 dB D DWT, PSNR = 27.01 dB

Figure 2-11. A scale-up details of denoised test image 'barbara'. The standard variance of
noise is 15.


A original image









CHAPTER 3
RIPPLET-II TRANSFORM

3.1 Introduction

Efficient representation of images or signals is critical for image processing, computer

vision, pattern recognition, and image compression. Harmonic analysis [2] provides a

methodology to represent signals efficiently. Specifically, harmonic analysis is intended

to efficiently represent a signal by a weighted sum of basis functions; here the weights

are called coefficients, and the mapping from the input signal to the coefficients is

called transform. In image processing, Fourier transform is usually used. However,

Fourier transform can only provide an efficient representation for smooth images but

not for images that contain edges. Edges or boundaries of objects cause discontinuities

or singularities in image intensity. How to efficiently represent singularities in images

poses a great challenge to harmonic analysis. It is well known that one-dimensional (ID)

singularities in a function (which has finite duration or is periodic) destroy the sparsity

of Fourier series representation of the function, which is known as Gibbs phenomenon. In

contrast, wavelet transform is able to efficiently represent a function with 1D singularities

[11, 12]. However, typical wavelet transform is unable to resolve two-dimensional (2D)

singularities along arbitrarily shaped curves since typical 2D wavelet transform is just a

tensor product of two 1D wavelet transforms, which resolve 1D horizontal and vertical

singularities, respectively.

To overcome the limitation of wavelet, ridgelet transform [19, 37] was introduced.

Ridgelet transform can resolve 1D singularities along an arbitrary direction (including

horizontal and vertical direction). Ridgelet transform provides information about

orientation of linear edges in images since it is based on Radon transform [38], which

is capable of extracting lines of arbitrary orientation.

Since ridgelet transform is not able to resolve 2D singularities, Candes and Donoho

proposed the first generation curvelet transform based on multi-scale ridgelet [39, 40].









Later, they proposed the second generation curvelet transform [41, 42]. Curvelet

transform can resolve 2D singularities along smooth curves. Curvelet transform uses

a parabolic scaling law to achieve anisotropic directionality. From the perspective of

microlocal analysis, the anisotropic property of curvelet transform guarantees resolving

2D singularities along C2 curves [21, 41-43]. Similar to curvelet, contourlet [44, 45] and

bandlet [46] were proposed to resolve 2D singularities.

However, it is not clear why parabolic scaling was chosen for curvelet to achieve

anisotropic directionality. To address this, we [51] proposed a new transform called ripplet

transform Type I (ripplet-I), which generalizes the scaling law. Specifically, ripplet-I

transform generalizes curvelet transform by adding two parameters, i.e., support c and

degree d; hence, curvelet transform is just a special case of ripplet-I with c = 1 and

d = 2. The new parameters, i.e., support c and degree d, provide ripplet-I with anisotropy

capability of representing 2D singularities along arbitrarily shaped curves.

Inspired by the success of ridgelet transform, we [52] proposed a new transform called

ripplet transform Type II (ripplet-II), which is based on generalized Radon transform [53]

[54]. The generalized Radon transform converts curves to points. It creates peaks located

at the corresponding curve parameters. Intuitively, our ripplet-II transform consists of

two steps: 1) use generalized Radon transform to convert singularities along curves into

point singularities in generalized Radon domain; 2) use wavelet transform to resolve

point singularities in generalized Radon domain. In this chapter, we extend the ripplet-II

transform in more cases and orthogonal ripplet-II transform is proposed.

To elaborate, we first define the ripplet-II functions and develop ripplet-II transform

and orthogonal ripplet-II transform in the continuous space. Then the discrete ripplet-II

transform and orthogonal ripplet-II transform are defined. Ridgelet transform is just a

special case of ripplet-II transform with degree 1. Properties of ripplet-II transform are

explored and demonstrated by experimental results.









Experimental results in texture classification and image retrieval show that ripplet-II

transform provides better feature extraction capability than ridgelet and wavelet based

approaches.

The reminder of this chapter is organized as below. Section 3.2 reviews generalized

Radon transform. In Section 3.3, we introduce ripplet-II transform in both continuous and

discrete cases. Section 3.4 presents the properties of ripplet-II transform. Experimental

results are shown in Section 3.5 and followed by conclusion in Section 3.6.

3.2 Generalized Radon Transform

Radon transform is widely applied to tomography [55]. Classical Radon transform is

defined in 2D space as the integral of an input 2D function over straight lines. For a 2D

integrable real-valued function f(x, y) where (x, y) E R2, classical Radon transform of

f(x, y) is defined by


R(r, 8)=/ f(x, y)6(xcos0+ysin r)dxdy (3-1)

Or, we can convert f(x, y) to f(p, Q) in polar coordinate system, then classical Radon

transform can be calculated by


R(r, 0) = J f (p, )6(p cos(t 0) r)pdpd (3-2)

The classical Radon transform is invertible. The original function can be reconstructed

based on the Projection-slice theorem [14]. To extend the classical Radon transform,

researchers proposed generalized Radon transform, which is based on an integral along

a family of curves [53] [54]. In the polar system with coordinates (p, 4), a curve can be

defined by

l/dcos( ( 0)) = rl (3 3)

where r and 0 are fixed, and d denotes degree. For d = 1 and d = 2, Eq. (3-3) represents

straight line and parabola as shown in Figure 3-1A and 3-1B, respectively. For d = -1

and d = -2, Eq. (3-3) represents circles through the origin and Cardioids as shown in





















B C


D E F

Figure 3-1. Curves defined by Eq. (3-3) in Cartesian coordinates. (a) d = 1. (b) d = 2.
(c) d = 3. (d) d = -1. (e) d = -2. (f) d = -3.


Figure 3-ID and 3-1E, respectively. When 0 < d < 1 or -1 < d < 0, curves intersect

themselves at least once. Then we have a single curve. Otherwise, curves do not intersect

themselves, which will lead to complicate situations. So we only consider Ibl > 1. We refer

d > 0 as 'positive curves' and d < 0 as 'negative curves' in the rest of this chapter.

The generalized Radon transform along curves can be defined in the polar coordinates

(P, 0) by


GRd(r,e ) = ( p, r pcoSd(( O)/d))dpdO (3-4)









If d = 1, Eq. (3-4) reduces to the classical Radon transform in Eq. (3-2). If d = 2,

Eq. (3-4) becomes [56]

GR2(r, ) = 2/r p/f(p/2, 2')6(p'cos(' 20)- /)d'dp'

= 2vrR[f(p'2, 2')](vr, 0/2) (3-5)

where R[f(x, y)](r, 8) denotes the classical Radon transform that maps f(x, y) to R(r, 0),

and is defined in Eq. (3-1); note that f(p, 4) under the polar coordinate system needs to

be converted to f(x, y) under Cartesian coordinate system before computing Eq. (3-1).

Eq. (3-5) shows that for d = 2, the generalized Radon transform can be implemented via

the classical Radon transform with appropriate substitutions of variables.

For the general case, i.e., d E Z, the generalized Radon transform can be computed

via Fourier series [53] [54]. Let f(p, 0) be a 2D function defined in polar coordinates (p, 4)

and GRd(r, 0) be its generalized Radon transform. Assume that the Fourier series for

f(p, 0) exists, i.e.,
+00
f(p, )= S f(p)e in (3-6)
n--oo
where

fo(p) f(p, 4)e-io'd4 (3-7)

Then the generalized Radon transform can be computed by
+00
GRd(r, O)= I gn(r)e'" (3-8)
n=-oo
where for d > 0

(r = 2 f(p) cos{(nd) cos-'(r/p)1/d}
g(r) = 2 (1 (r/p)2/d)1/2 dp

= 2 f(p) x (1 (r/p)2d)-1/2 X Tnd((r/p) /d)dp (3-9)

where Tn(-) is the C'!. I ,l--!v polynomial of degree n, and fn(p) is given by Eq. (3-7).








Putting Eqs. (3-6),(3-7),(3-8),(3-9) together, we have the generalized Radon
transform of the function f as



GR(r, 0) = 2 Z [ f (p, )e-'idOx (1-(r/)2d)-12 Td((r/ p) d)dp]eino1 (3-10)
n---oo -"

The inverse transform is defined by

+ 1 d *, 1 2' -*27x
f(p, ) = GRd(r, 0)e0 x Tnd((rp) d
n=. dp j

x((r/p)2/d )-1/2 1/d drd]ein (3-11)
r

For negative curves (i.e. d < 0), the Generalized Radon transform is



GR(r, 0) = 2 [ f(p, )e-ind x ( (1 (r/p)-2/d)-1/2 T -nd((r/p) -1d)dp]ei

(3-12)
The inverse transform for negative curves is defined by

f (p, ) = d p GRd(r, )e-in x Tnd((r/p)-1/d)

x((r/p)-2/d )-1/ 1/ddrd0]e'in (3-13)
r

3.3 Ripplet-II Transform
3.3.1 Continuous Ripplet-II Transform
To present ripplet-II transform, we need to define ripplet-II functions first. Given a
smooth univariate wavelet function p : R -+ R with f pt(t)dt = 0, we define a bivariate
function ha,b,d,O : R2 R @2 in the polar coordinate system by

S.,,,(,) = a-1/2p((pcosd((o )/d) b)/a) (3-14)











where a > 0 denotes scale, bE R denotes translation, d E N denotes degree, and

0 c [0, 27) denotes orientation. Function bab,d,0 is called ripplet-II function. Here, we only

consider d > 0 (i.e. positive curves), since positive curves are open curves. Examples of

ripplet-II functions with different parameter settings are shown in Figure 3-2. Ripplet-II

can be scaled, translated and rotated according to the parameters a, b, 0. Note that when

d = 1, ripplet-II reduces to ridgelet as shown in Figure 3-3; i.e., ridgelet transform is just a

special case of ripplet-II transform with d = 1.


21-

0-

2-
-3,'
300 '
200
100
100
0 0


500
400


200
100 100
100
0 0

B


S 500
400


250
200


2 0 x 0 0

C D


Figure 3-2. Ripplet-II functions in Cartesian coordinates (xi, x2) (a) a = 1, b = 0, d = 2
and 0 0 .(b)a = 2, b = 0, d = 2 and 0 = 0 .(c) a = 1, b = 0.05, d = 2 and
0 = 0 .(d)a = 1, b = 0, d = 2 and 0 = 30.


S 500
400


"QfIU























-0.5 -1

-1 -2
300 300
500 500
200 4 5. 00200 4 00
400 400
0300 10 00
100 200 300100
1000 x 100
2 0 0 x 2 0 0xl


A B





1 2-


05- 1
0-
0-1


-0.5 -2
300 20 300 .
4200 4 500 200 500
300 300
100 3 200 100 100
2000
2 0 0 X2 0 0

C D


Figure 3-3. Ripplet-II functions in Cartesian coordinates (xi, x2) (a) a = 1, b = 0, d = 1
and 0 = 0 .(b)a = 2, b = 0, d = 1 and 0 = 0 .(c) a = 1, b= 0.05, d = 1 and
0 = 0 .(d)a = 1, b = 0, d = 1 and 0 = 30



3.3.1.1 Forward transform


Ripplet-II transform of a real-valued 2D function f is defined as the inner product


between the function f and ripplet-II functions



07(a, b, d, 0) = 7,, p,(p, )f (p, )pdpd (3-15)



where b is the complex conjugate of tb, and f(p, 0) is under the polar coordinate system.


Ripplet-II transform has the capability of capturing structure information along


arbitrary curves by tuning the scale, location, orientation, and degree parameters. From








Eq. (3-15), we have


Rf(a, b, d, 0)


(;) j/p, ),f(p, )f(p, O)pdpd

(b) '/ 2 (( pcsd((O )/d) b)/a)f (p, )pdpdO

= a- 1/2 ((r b)/a)(r p cod((O )/d))drf(p, )pdpd

= a-1/2((r -b)/a)x [ 6(r pcosd((0 )/d))f(p, O)pdpdc] dr

< (3-16)


where (a) is due to Eq. (3-15); (b) is due to Eq. (3-14); (c) is due to Eq. (3-4) and GRd[f]
is the generalized Radon transform (GRT) of function f. Eq. (3-16) shows that ripplet-II
transform can be obtained by the inner product between GRT and 1D wavelet, which
is the 1D wavelet transform (WT) of GRT of function f; i.e., the ripplet-II transform of
function f can be obtained by first computing GRT of f, and then computing 1D WT of
the GRT of f as below:


f(P, R) T 1DWT (,,,
f>(p, p) GRd [f(r, 0) > f (a, b, d, 0)


where the 1D WT is with respect to (w.r.t.) r.
In details, ripplet-II transform of f can also be obtained through

Rf(a, b, d, 0) = 2 a-1/2(r b)/a) f (p, O)e-idO
n -o (r/p 12
x(1 (r/p)2/d)-1/2 X Tnd((r/p) l/d)dpeindr


(3-17)


(3-18)


3.3.1.2 Inverse transform
Ripplet-II transform is invertible. Given ripplet-II coefficients Rf(a, b, d, 0), we can
reconstruct the original function f through
1 o)d [0 f2 [a00 fb, d, 0 r a
f(p, ) Fn -2[ dp Jo Jo 10 b_
f( )-2R(a, b, d, 8)=( )e-ine

x Tnd((r/p) 1d) X ((r/p)2d 1)- 1/d dadbdrdO]eino (3-19)
r









Reversing the process in (3-17), the inverse of the ripplet-II transform of function f can be

obtained by first computing inverse WT (IWT) of 7Rf(a, b, d, 0) w.r.t. a and b, and then

computing inverse GRT (IGRT) as below:

1D-IWT IGRT
Zf(a, b, d, 0) 1 GRd[f](r, 0) -= f(p, ) (3-20)


where IGRT can be computed by the method in Section 3.2, Eq. (3-11).

3.3.2 Continuous Orthogonal Ripplet-II Transform

As shown in (3-17), ripplet-II transform can be implemented as a 1D wavelet

transform along the radius of the generalized Radon domain. If we apply 2D wavelet

transform to the generalized Radon coefficients, the additional wavelet transform along

angle 0 holds the potential of improving the sparsity of transform coefficients. We call the

new extension orthogonal ripplet-II transform.

In mathematics, orthogonal ripplet-II transform of a function f(p, () in polar

coordinates is defined by


+0 f fl-,r bl_ (- b2
th(a, bi, b2, d) = 2 / a0) () f (p, )e-in'd
n--oo
x (1 (r/p)2/d)-1/2 Tnd((r/p) d)dpeinodrd0 (3-21)


Similar to ripplet-II transform, orthogonal ripplet-II transform of the function f can

be obtained by first computing GRT of f, and then computing 2D WT of the GRT of f as

below:



GRT 2D-WT
f(p, ) GRd[f(r, 0) rth(a, bl, b2, d) (3-22)


There is no direction parameter in orthogonal ripplet-II coefficients Rrth(a, bl, b2, d).

This may not provide explicit information about the directions of curves. However, due









to the additional wavelet transform along angles, sparser representation of functions is

achieved.

Orthogonal ripplet-II transform is also invertible. Given orthogonal ripplet-II

coefficients Rrrth(a, bl, b2, d), we can reconstruct the original function f through

f (p, 1) f0 00 ,orth(a, b b2, d)I( ) ( )e-ino
7F nZ_ dpjP Jo Jo I0 0 ab,2 bl )2

x Tnd((r/p)/d) x ((r/p)2/d )-1/21/ddadbidb2drd0]e(-23)
r

Reversing the process in (3-22), the inverse of the orthogonal ripplet-II transform of

function f can be obtained by first computing inverse 2D WT (IWT) of R rth(a, bl, b2, d)

w.r.t. a, bl and b2, and then computing inverse GRT (IGRT) as below:
Rorth 2D-IWT IGRT
(a, b1, b2, d) GRd[f](r, 8) f(p, o) (3-24)

3.3.3 Discrete Ripplet-II Transform

If the input of ripplet-II transform is a digital image, we need to use discrete ripplet-II

transform. Following the paradigm in (3-17), discrete Ripplet-II transform of function f

can be obtained by first computing discrete GRT (DGRT) of f, and then computing 1D

discrete WT (DWT) of the DGRT of f as below:

DGRT 1D-DWT
f(p, ) GRd[f]r, 0) f (a, b, d, 0) (3-25)

The discrete orthogonal ripplet-II transform follows the paradigm in (3-22) and is

obtained by

DGRT 2D-DWT
f(p, ) GRd[f](r, ) t (a, b, b2, d) (3-26)

If d = 2, there is a simpler method to computer discrete ripplet-II transform, the

details of which will be elaborated in Section 3.3.4.
























A B


G H


Figure 3-4.


Gaussian images with a curve edge. Top row: original image f(x, y); Middle
row: Magnitude of 2D Fourier transform; Bottom row: Magnitude of 2D
Fourier transform after substituting the polar coordinate (r', 0') with
( /, 0/2). Left column: Parabolic curve. Middle column: Curve with degree 3.
Right column: Curve with degree 4









3.3.4 Discrete Ripplet-II Transform with d = 2

For d = 2, the generalized Radon transform is called parabolic Radon transform [56].

Eq. (3-5) shows that for d = 2, the generalized Radon transform can be implemented via

the classical Radon transform with appropriate substitutions of variables. Hence, we can

compute discrete ripplet-II transform via Eqs. (3-25) and (3-5).

Computation of Forward Ripplet-II Transform with d = 2. The forward

transform can be obtained by the following steps.

1. Convert Cartesian coordinates to polar coordinates, i.e., convert f(x, y) to f(p, Q).
For f(p, Q), substitute (p, Q) with (p'2, 20'). Convert polar coordinates (p', 0') to
Cartesian coordinates (x, y), and obtain new image fi(x, y) by interpolation, where
x and y are integer-valued.

2. Apply classical Radon transform to fi(x, y), resulting in R(r', 0'). In function
R(r', 0'), substitute (r', 0') with (Vr, 0/2); and obtain the generalized Radon
coefficients GR2(r, 0) via Eq. (3-5).

3. Apply ID wavelet transform to GR2(r, 0) with respect to r, and obtain the ripplet-II
coefficients.

To show the sparsity of ripplet-II transform with d = 2, we plot Figure 3-4. As we

know, ridgelet transform of a 2D function f(x, y) is computed by 1) 2D Fourier transform

of f(x, y), 2) converting Cartesian coordinate system to the polar coordinate system

(J', 0'), 3) 1D inverse Fourier transform w.r.t. a', resulting in (r', 0'), 4) 1D wavelet
transform w.r.t. r'. In contrast, ripplet-II transform of a 2D function f(x, y) is computed

by 1) 2D Fourier transform of fi(x, y), 2) converting Cartesian coordinate system to

the polar coordinate system (G', 0'), 3) 1D inverse Fourier transform w.r.t. w', resulting

in (r', 0'), 4) substituting (r', 0') with (v/r, 0/2), 5) 1D wavelet transform w.r.t. r. The

key different between ripplet-II transform and ridgelet transform in their computing

procedures is that ripplet-II transform has an extra step, i.e., substituting (r', 0') with

( V, 0/2). If we apply 1D wavelet transform to the middle row in Figure 3-4, we obtain

ridgelet transform coefficients. If we apply 1D wavelet transform to the bottom row in

Figure 3-4, we obtain ripplet-II transform coefficients. It is observed that the Fourier









transform coefficients in the bottom row of Figure 3-4 are sparser than those in the

middle row of Figure 3-4. This is why ripplet-II transform provides sparser coefficients

than ridgelet transform. In other words, substituting (r', 0') with (V/r, 0/2) helps make

coefficients sparser.

Computation of Inverse Ripplet-II Transform with d = 2. The inverse

transform can be obtained by the following steps.

1. Apply 1D inverse wavelet transform to ripplet-II coefficients with respect to r,
resulting in GR2(r, 0).

2. In function GR2(r, 0)/(2/r), substitute (r, 0) with (r'2, 20'), resulting in R(r', 0').

3. Apply classical inverse Radon transform to R(r', 0'), resulting in fi(x, y).

4. For fi(x, y), convert Cartesian coordinates (x, y) to polar coordinates (p', ('),
resulting in fi(p', 0'). For function fi(p', 0'), substitute (p', 0') with ( 2 0/2),
resulting in f(p, 0). For f(p, 0), convert polar coordinates (p, 0) to Cartesian
coordinates (x, y), and obtain f(x, y) by interpolation, where x and y are integer-valued.

Computation of Orthogonal Ripplet-II Transform with d = 2. The

computation of orthogonal ripplet-II transform is similar to that of ripplet-II transform.

The only difference is to replace 1D wavelet transform with 2D wavelet transform.

Particularly, the forward orthogonal ripplet-II transform is implemented by first computing

Step 1 and 2, and then replacing 1D wavelet transform to 2D wavelet transform in Step3.

The inverse orthogonal ripplet-II transform is computed by first replacing 1D inverse

wavelet transform to 2D inverse wavelet transform in Step 1, and then following the

remaining steps.

3.4 Properties of Ripplet-II Transform

According to the definition, we can directly find the following properties about

ripplet-II transform

* Localization: Ripplet-II with degree d decays fast along curves of polynomial degree
d.

Directionality: Ripplet-II can be oriented toward arbitrary direction.











* Flexibility: Compared to ridgelet, ripplet-II provide flexible choice for degrees.
Optimal degree can be determined for specific applications.


Ripplet-11 d=1 (ridgelet)
S


Ripplet-11 d=2


Ripplet-ll d=3 Original image


A Phantom


Ripplet-11 d=1 (ridgelet)


RiDDlet-ll d=3


Ripplet-11 d=2


Original imaae


B Parabola

Figure 3-5. Ripplet-II with different degrees.



In Figure 3-5, ripplet-II representation of images with different degrees are presented.

It can be observed that different degrees present different levels of sparsity.









Figure 3-6 shows the magnitude of transform coefficients in a decreasing order
for wavelet, ridgelet, ripplet-II and orthogonal ripplet-II transforms; the magnitude of
coefficients of each transform is normalized by the largest coefficient of the corresponding
transform. It can be observed that ripplet-II has the fastest decay in coefficients,
compared to wavelet and ridgelet. This is the reason why ripplet-II transform can provide
sparser representation for images with edges than wavelet and ridgelet. In Figure 3-6,
orthogonal ripplet-II demonstrates faster decay than ripplet-II, which indicates that
orthogonal ripplet-II transform can provide sparser representation of functions than
ripplet-II as stated in Section 3.3.2.
Besides the aforementioned properties, ripplet-II transform can provide rotation
invariance. We show this as below. If we have an image fi(x, y) as well as its rotated
version f2(x, y) rotated by an angle a, i.e.,

f2(x,y) = fi(x cos(a) + ysin(a), -xsin(a) + ycos(a)) (3-27)

In the polar coordinate system, we have

f2(P,) f(p,- a) (3-28)

So ripplet-II transform of f2 is

Rf2(a, b, d, 0) = a,b,d,P, O)f2(p, )pdpdO

-= ,,(p, 3)fi(p, a a)pdpdQ

= J .i-a(P, ))fi,(p,O)pdpdQ

= Rf,(a,b,d, a) (3-29)

Applying 1D Fourier transform on both sides of Eq. (3-29) with respect to 0, we have

F{Ref(a, b, d, 0)} 1= J f(a, b, d, 0 a)e-ido = e- iwF{Rf(a, b, d, 0)} (3-30)
(7


























wavelet
rndgelet
rippletll
orthogonal rippletll


1000 2000 3000
Index of sorted coefficients


A Lena


4000 5000


wavelet
ridgelet
rippletll
orthogonal rippletll


1000 2000 3000
Index of sorted coefficients


4000 5000


B Barbara


Figure 3-6. Comparison of coefficient decaying between

orthogonal ripplet-II with d = 2.


wavelet, ridgelet, ripplet-II and


S101

E


10-2
o


10
N
-3



I 10-
z


100





10

re
E


10-2
10-

N

0 103
z









Obviously, we have IF{7R, (a, b, d, 0)} = IF{Rf (a, b, d, 0)}1; i.e., the magnitude

of 1D Fourier transform (w.r.t. 0) of ripplet-II transform is rotation invariant. Hence,

ripplet-II transform can provide rotation invariant features. Since there is no explicit

direction parameter in orthogonal ripplet-II coefficient, orthogonal ripplet-II transform

does not have the rotation invariant property.

3.5 Experimental Results

In this section, we evaluate the performance of ripplet-II transform in the problems

of texture classification and image retrieval, where ripplet-II transform serves as a feature

extractor. Our experiments use the texture volume in USC-SIPI image database [57].

The texture volume consists of 2 sub-databases, all of which contain monochrome texture

images.

3.5.1 Texture Classification

A sub-database named Rotated Textures [58] in the texture volume contains a

set of rotated textures. Each image in the sub-database is of size 512x152 pixels. The

sub-database contains a total of 13 textures as shown in Figure 3-7 and each texture

has 7 versions, which are rotated by 0, 30, 600, 900, 1200, 150, and 2000. Hence, the

sub-database contains a total of 13 x 7 = 91 images. Next, we describe the feature

extraction and classification algorithms used in the experiments.

Table 3-1. Information extracted from data
Notation Equation Description
Cl Yi R(i, ) first absolute moment
C2 NM EL Ei IR(i,j) (i,j)2 variance
C3 N NM _, IR(i, i)2 average energy
C4 P- kI Pi log Pi entropy


Feature extraction 1. We first partition the transform coefficient matrix into

ND nonoverlapped blocks. So the feature dimension is ND. The feature extracted from

transform domains is the statistical information from nonoverlapped blocks of size N x M.

Information such as first absolute moment, variance, average energy and entropy carries




































Figure 3-7. Textures used in texture classification.


geometry information from the original image as shown in Table 3-1 [59] The rotation

invariant property of ripplet-II transform guarantees that rotated images have almost

the same feature vector as that of the original image. The statistical information from

Table 3-1 can also provide rotation invariant property for wavelet transform approach.

Feature extraction 2. We apply a transform to each image and obtain a vector

of transform coefficients. Assume that we have Nt images for training. Then, we have

Nt vectors of transform coefficients, which form a matrix. We apply principle component

analysis (PCA) [60] to this transform coefficient matrix and obtain eigenvalues/eigenvectors

of the matrix. PCA provides a transformation matrix, which consists of principle

components (normalized eigenvectors); we multiply the transformation matrix and

a transform coefficient vector, resulting in a feature vector. We choose the feature
































Figure 3-8. Textures rotated with different angles.


dimensions that corresponds to the ND principal components whose eigenvalues are

largest. Hence, the resulting feature vector is ND-dimensional.

Classification algorithm. We use k-nearest-neighbor (kNN) classifier where k = 5.

The distance measure used in kNN is ND-dimensional Euclidean distance. A leave-one-out

cross-validation classification algorithm is used to evaluate the classification performance.

Specifically, we first compute the distance between a test feature vector and each of the

feature vectors with known labels, and then determine the class label of the test feature

vector by a k-nearest-neighbor classifier. Our performance measure is error rate, which is

the ratio of the number of mis-classified images to the total number of images tested.

We test four types of transforms, i.e., ridgelet, wavelet, ripplet-II and orthogonal

ripplet-II transform. Texture classification performances are listed in Table 3-2 and

Table 3-3 corresponding to feature extraction 1 and feature extraction 2, respectively.

Table 3-2 shows that ripplet-II transform achieves lower error rate than wavelet transform

under all feature vector lengths tested and under C1, C3, and C4; ripplet-II transform










Table 3-2. Error rate under different transforms using feature extraction 1
C1
Feature dimension ND Wavelet Ridgelet Ripplet-II Orth-Ripplet-II
1 0. 1.', 0.9341 0.2802 0.1099
4 0.0441 0.8462 0.0019 0.1319
16 0.0187 0.7912 0 0.0879
64 0.0057 0.8462 0 0.0879
256 0.0024 0.9121 0 0.0879
C2
1 0.2826 0. 121. 0.1715 0.1099
4 0.0929 0.3516 0.0728 0.1099
16 0.0536 0.4176 0.0608 0.1319
64 0.0374 0.5385 0.0580 0.1429
256 0.0172 0.5824 I I .'.I. 0.1099
C3
1 0.2883 0. 12-. 0.1844 0.1209
4 0.0072 0.3077 0.0024 0.1319
16 0.0010 1 ;: I, 0.0010 0.1099
64 0.0010 0. ;', 0.0010 0.1099
256 0.0019 0.5055 0 0.0879
C4
1 0.6624 0.8352 0.4061 0.5824
4 0.8841 0.8352 0.1049 0.3407
16 0.9176 0.7143 0.0532 1' -.7
64 0.9746 0.6923 0.2055 ii '.7
256 0.9909 0.6374 0.1173 7


achieves higher error rate than wavelet transform under C2 for feature vector length

larger than or equal to 16. Results also indicate that the entropy method C4 is not a good

statistical feature. Table 3-2 shows that ripplet-II transform outperforms other transforms

in almost all cases by achieving lowest error rate. However, orthogonal ripplet-II transform

is worse than ripplet-II and wavelet. Table 3-3 shows that ripplet-II transform achieves

lower error rate than ridgelet wavelet and orthogonal ripplet-II transform under all

feature dimensions tested. The reason why ripplet-II transform achieves the best

classification performance is two-folded. First, ripplet-II transform is able to provide

sparser feature vectors than ridgelet and wavelet transform. Second, the rotation invariant

property of ripplet-II transform guarantees that rotated images have almost the same

feature vector as that of the original image.









Table 3-3. Error rate under different transforms using feature extraction 2
Feature dimension ND Ridgelet Wavelet Ripplet-II Orth-Ripplet-II
1 0.3908 0.2639 0.1341 0.1758
2 0.2648 0.0752 0.0172 0.1538
4 0.2261 0.0527 0.0029 0.1538
8 0.1676 0.0350 0.0005 0.1319
16 0.1058 0.0177 0 0.1319
32 0.0704 0.0105 0 0.1319


3.5.2 Image Retrieval

We also conduct experiments to demonstrate the performance of Ripplet-II in content

based texture image retrieval.

A sub-database named Textures [50] in the texture volume contains 58 images,

each of which contains one type of texture. Among the 58 images, 33 images are of size

512x152 pixels and 25 images are of size 1024x1024 pixels. To test the rotation-invariant

capability of different transforms, we need to create rotated versions of the images in the

sub-database. To achieve this, we first rotate a texture image by angles from 0 to 350

with a stepsize 100; then we crop a patch of size 128x 128 pixels from the center region of

the rotated image. By doing so, we obtain 58 x 36 = 2088 images.

Image retrieval is done in the following steps. First, a test image is given as a query

to the image retrieval system. Second, apply a feature extraction algorithm (which is

the same as that in Section 3.5.1) to the test image, and obtain a feature vector. Third,

apply kNN classifier (where k = 35) to the test feature vector and the 2088 images

serve as training samples for the kNN classifier; the distance measure used in kNN is

ND-dimensional Euclidean distance. Assume that in the k images (which are output of the

kNN classifier), Nr images are rotated versions of the test image. We call Nr/k as retrieval

rate, which represents the success rate of image retrieval. We test four types of transforms,

i.e., ridgelet, wavelet, ripplet-II and orthogonal ripplet-II transform. Table 3-4 and 3-5

list average retrieval rate using different feature extraction approaches. Table 3-4 shows

that ripplet-II and orthogonal ripplet-II transforms have higher average retrieval rate than










wavelet transform in most cases. Ripplet-II and orthogonal ripplet-II transforms have

similar performance in C1, C2 and C3. Results show that C4 is not a good feature for

image retrieval. Table 3-5 shows that ripplet-II transform achieves higher average retrieval

rate than ridgelet and wavelet transform under all feature dimensions tested. Orthogonal

ripplet-II transform outperforms wavelet and ridgelet transform.

Table 3-4. Average retrieval rate under different transforms using feature extraction 1
C1
Feature dimension ND Wavelet Ridgelet Ripplet-II Orth-Ripplet-II
1 0.4174 0.4521 0.6312 0.8135
4 0.6669 0.6544 0.9948 11 '. 10
16 0.7034 0.6869 0.9959 0.9409
64 0.7016 0.6632 0.9927 i',
256 0.7382 0.6094 0.9876 0.9695
C2
1 0.6394 0.4489 0.7653 0.71
4 0.7182 0.6438 0.7883 0.7692
16 0.7591 0.6079 0.7T L1 0.8386
64 0.7",- 0.5237 0.7894 I ,
256 0.8070 0.4536 0.7905 0.8876
C3
1 fI i 1 0.3344 0.7581 0.7536
4 0.8511 0.5968 0.9299 0.7870
16 0.8384 1 ",,. 0.9413 0.9220
64 0.7863 0.5290 11 ', 0.9632
256 0.7532 0.4923 0.8701 0.7731
C4
1 0.1754 0.1810 0.5551 0.2421
4 0.0911 0.1529 0.7660 0.3226
16 0.0539 0.1203 0.7687 0.3963
64 0.0278 0.0883 0.4248 0.3810
256 0.0190 0.0507 0.5305 0.3749


Table 3-5. Average retrieval rate under different transforms using feature extraction 2
Feature Dimension ND Ridgelet Wavelet Ripplet-II Orth-Ripplet-II
1 0.5013 0.6426 0.803 0.8016
2 0.5773 0.8513 0.9495 0.8940
4 0.5921 0.8898 0.9831 0.9343
8 0.6343 0.9154 0.986 0.9831
16 0.7054 0.941 0.9869 0.9866









Compared to the database in Section 3.5.1, the database used here has smaller

in-class distance. Experimental results show that ripplet-II works well for both large and

small in-class distances. Orthogonal ripplet-II transform only works for small in-class

distance case.

3.6 Summary

In this chapter, we proposed a new transform called ripplet transform Type II

(ripplet-II) for resolving 2D singularities. Ripplet-II transform is basically generalized

Radon transform followed by 1D wavelet transform. Both forward and inverse ripplet-II

transform were developed for continuous and discrete cases. Ripplet-II transform with

d = 2 can achieve sparser representation for 2D images, compared to ridgelet. Hence,

ripplet-II transform can be used for feature extraction due to its efficiency in representing

edges and textures. Ripplet-II transform also enjoys rotation invariant property, which

can be leveraged by applications such as texture classification and image retrieval.

Experiments in texture classification and image retrieval demonstrate that the ripplet-II

transform based scheme outperforms wavelet and ridgelet transform based approaches.









CHAPTER 4
SPARSITY BASED DE-ARTIFACTING IN VIDEO CODING

4.1 Introduction

In most hybrid video coding schemes, various techniques are proposed to reduce

the spatial and temporal redundancy. Block motion compensation is used to reduce

the redundancy among frames and block transform is employ, ,1 to eliminate spatial

correlations. However, quantization of transform coefficients and block based processing

yield visual artifacts in the reconstructed video sequences. Discontinuities across block

boundaries, distortion around structure edges and within texture regions, and 'ringing'

effects are commonly observed for large quantization step sizes as shown in Figure 4.1. In

H.264/AVC, the state-of-the-art video coding standard, an in-loop deblocking filter [61]

is adopted to remove discontinuities at block boundaries. A low-pass filter is adaptively

chosen according to the characteristic of the block boundaries. The deblocking filter

removes most of blocky effects and improves the code efficiency. Nevertheless, the

other types of artifacts still remain, because the filter only affects pixels near the block

boundaries.

To remove more general compression artifacts, some sparsity-based denoising filtering

were proposed. [62] and [63] proposed local approaches that adapt to the nonstationary

of the content by applying a sparse image model to the local neighborhood. Multiple

representations of a pixel are achieved by an overcomplete set of transforms and a

thresholding operation. Then, a fusion process is applied to obtain the final denoised

pixel. Although these filters achieve superior objective and subjective qualities compared

to the H.264 deblocking filter, the transform set should be carefully selected based on the

characteristics of local neighborhood in order to efficiently decouple true signal and noise.

Another class of sparsity-based denoising approaches try to exploit the existing non-local

self-similarities within a picture or video. In [34], the Block-Matching and 3D filtering

(BM3D) non-local approach is proposed, which achieves state-of-the-art performance in


























A B


Figure 4-1. Coding artifacts: (A) Edge distortion. (B) Ringing effects. (C) Texture
distortion. (D) Blocky artifacts.









image denoising. In BM3D, similar blocks are found by block matching within the image

and stacked into a 3D array. Because of the similarity shared among similar blocks, the

generated 3D array signal is almost stationary along the stacking dimension, so it may

be sparsely represented by any standard basis (p.e., FFT or DCT). Then, a 3D transform

is applied to the 3D array signal followed by a shrinkage in the transform domain to

attenuate the noise. After the inverse transform, the processed blocks are put back to their

original positions. Because the blocks may overlap with each other, each pixel may have

more than one denoised estimate. The final estimate of a pixel is a weighted average of

the all overlapped estimates. In [35], Video denoising by Block-Matching and 3D filtering

(VBM3D) is proposed to extend BM3D to video denoising by introducing multiple

pictures in the search region. The work [36] went beyond VBM3D by extending the

operation atom from a 2D block to a more general 3D patch. Transform dimensionality

is also increased to 4D. This extension considers local motion features during the patch

clustering process. Finally, in [36], the whole denoising algorithm is formulated under a

variation ', i, -i ,i framework. A variational EM algorithm is applied to iteratively refine

the denoising result and patch clusters. This new approach achieves good performance

in denoising and other video processing applications. In this chapter, we put the above

three denoising approaches (2D local [63], 3D nonlocal [34], and 4D nonlocal [36]) into a

common video compression artifact removal application. The three denoising approaches

are applied to process the H.264/AVC compressed video sequences as a post-filter. First,

we unify them into a de-artifacting framework. After that, our goal is to investigate the

performance of the approaches under this common framework.

The remainder of this chapter is organized as follow. Section 4.2 introduces the

framework we set up for this investigation. Section 4.3 describes the de-artifacting

algorithms implemented in this framework. Section 4.4 analyzes the complexity of

proposed framework. Performance comparisons and analysis are presented in Section 4.5.









4.2 Framework

As explained in the introduction, the three sparsity based approaches are very

similar. All of them exploit the sparsity model of natural images; the shrinkage is done

in transform domain; multiple hypothesis fusion is involved to infer the final denoised

results. Although the approaches are similar in these core components, they vary a lot

in the details. In this section, we unify these denoising approaches into a framework for

the compression artifact removal application. A direct implementation of the introduced

denoising algorithms is not suitable for our goal. For example, the work in [36] exploits

an iterative EM algorithm to enhance the patch clustering and denoising results giving

better results. However, the complexity is too high for video compression, especially for

potential real time applications. In different algorithms, different transforms are used, so

it is hard to come to fair comparisons. In our framework, on one hand, we expect that the

de-artifacting may reuse existing resources of the H.264/AVC codec. On the other hand,

we have to align the algorithms as much as possible for a fair comparison. Therefore, a

unification effort is needed. The unification is done without changing the core idea of

the original algorithms, while putting them in the same baseline. In the next section,

we introduce the individual core components of de-artifacting algorithms that we have

implemented based on both the original counterparts and the above unification criteria.

4.3 Algorithm Description

First of all, we define a patch as a connected pixel set that covers space and time

within a video signal. The dimension of a patch may be 2 or 3 based on the definition. A

2D patch is very similar to the "block" widely used in most coding standards if we add a

spatial rectangular shape constraint. A 3D patch incorporates the temporal dimension,

which means a 3D patch may cover multiple frames of a video. Although the topology

of a patch may be arbitrary in space and time, we assume for simplicity that the spatial

and temporal shapes are rectangular. Under this assumption, a 2D patch is a block and

a 3D patch is a cube as shown in Figure 4-2. With the patch definition, a video sequence










frame


~7L:


2D patch





Figure 4-2. 2D and 3D patches in a video sequence.

Reference patch Local region


3D patch


Patch S(
Organizat


weights


Patch Se
Un-
organizatii


Figure 4-3. Flowchart of the proposed framework. 2D patches are used for demonstration.


is assumed to be composed of many overlapping patches, either 2D or 3D. The proposed

de-artifacting framework is shown in Figure 4-3. The key components include similar

patch set formulation (similar patch search and organization), sparsity enforcement

(transform and filtering), and multiple hypothesis fusion (weighted averaging), which

correspond to the core components of the three approaches. We elaborate on them in the

following subsections.


A
ion Transform



Transform
domain filtering

t
SInverse
Transform
on









4.3.1 Similar Patch Set Formulation

Although there is no explicit similar patch search process in [62] and [63], we can

assume the pre-fixed similar patch set (i.e., neighboring patches) is used. It does not hurt

to unify it into the general similar patch set formulation process. The similar patch set

formulation is explicit and very important for the works in [34] and [36]. In [62], [63], and

[34], 2D patch is the atom of the algorithm, which is different from that in [36], where a
3D patch is used. We define the following general similarity measurement for generating

the similar patch set:

S(Po) = {Pi e Pld(P,, Po)< Kmatch}, (4 1)

where S(Po) is the similar patch set of a patch Po, P is the whole patch set in the search

range, Pi denotes the i-th patch in P, d(Pi, Po) is the distance between Po and Pi, and

Match is the maximum distance. If the distance is smaller than Tmatch, then two patches
are considered "-i~i, I, If d(., *) is defined as the physical location of two patches in a

video lattice, the pre-fixed similar patch set in [62] and [63] is unified into Eq.(4 1). For

the algorithms in [34] and [36], we define d(., -) as Euclidean intensity distance of two

patches, i.e., d(Pi, Po) = IP, P01 In our framework, the search range may cover both

the spatial and temporal domain in order to fully exploit the video signal spatio-temporal

redundancies.

4.3.2 Sparsity Enforcement

There are different v--v to enforce sparsity within a similar patch set. In [62] and

[63], the transform is directly applied to each patch in the similar patch set. In the two

other approaches, patches in the similar patch set are sorted and packed to generate a

higher dimensional array signal. This process is demonstrated in Figure 4-4. Patches in

the similar patch set are sorted according to the similarity measure in ascending order.

Then the sorted patches are stacked to a higher dimension array. Because of the similarity

shared by these patches, the packed array signal exhibits near stationary characteristics















sorting


packing


Figure 4-4.


Patch sorting and packing. The
similarity. The lower the degree
reference patch.


degree of gray in patches indicates the
is, the more similar the path is to the


along the packing dimension. This kind of array signal is very easy to sparsely decompose

by any standard basis.

To suppress the noise in transform domain, various approaches can be used such as

thresholding or filtering [64].

Hard thresholding.









Hard thresholding sets all values less than or equal to given threshold to zero and

leaves other values unchanged.

= Y, Y> threshold

0, Y < threshold

Soft thresholding.

Unlike hard thl -I1i.liii:. soft thresholding also changes those values greater than

given threshold by subtracting the threshold value from them.

{ Y threshold, Y > threshold
Y = (4-3)
0, Y < threshold

Wiener filter.

Wiener filter [65] is used to filter out noise in a corrupted signal based on statistical

approach. The output of Wiener filter is written as

S max(Y2 a2, 0)
y=Y2 (44)
Y2

where Y is a noisy transform coefficient, Y is a filtered coefficient, and a is estimated

standard deviation of the noise (or artifacts).

4.3.3 Multiple Hypothesis Fusion

After filtering in the transform domain, an inverse transform is used to reconstruct

the filtered similar patch set. At this point, each patch in the original similar patch set has

a filtered version. Since overlapping of patches is possible, a pixel may appear in multiple

patches and a patch may appear in multiple similar patch sets. Each pixel may have more

than one estimate after all patches in a frame or video are processed.1 The next step is

to fuse all the estimates to get the final filtered pixel. In this chapter, we directly give the



1 Suitable selection of search range can guarantee the non-empty of S(Po).









fusion model as in Eq.(4-5). Interested reader can refer to [36] for the detailed derivation.


E[xly] = aiE[x Pi], (4-5)


where y is the noisy pixel, x is the clean version of y, Pi is the ith patch that has overlap

with pixel x. E[- ] is expectation operation. Weighting coefficients ai may be estimated by

the sparsity priors as in
1
ai- (46)

where NAz(i) is the number of non-zero coefficients of the transformed patch Pi. The

weights are estimated by the sparsity priors of transformed patch. We assume the

compression noise has little sparse characteristics, unlike the true image signal. We

give more weight to the patches that are more sparse because it is highly probable that

they provide a more accurate estimate.

4.3.4 Application to Compression Artifact Removal

In Subsection 4.3.2, we proposed the use of Wiener filter in the transform domain. As

shown in Eq.(4-4), we need to know a, the standard deviation of noise. In our work, a is

estimated at encoder and sent as side information to the decoder. a may be inserted in

the slice header or SEI (Supplemental Enhancement Information). The unified denoising

algorithms may be incorporated into a video coding scheme as a postprocessing tool as

shown in Figure 4-5 or as an in-loop filter as shown in Figure 4-6. In this chapter, we

incorporate them as a postprocessing filtering in order to check their artifact removal

performance. We compare the de-artifacting performance in the following sections taking

into account several aspects.

4.4 Complexity Analysis

The flexibility of patches and transforms gives rise to the necessity of analyzing

computational complexity of proposed framework. The 1i ii, Pr complexity of the framework

lies in two parts: searching and transforms.











Input
Vid Transform Quantization

De-Quantization


Inv. Transform
Entropy Bit
Coding Stream

Deblocking
Intra-frame Filter
Prediction


SMotion / Deartifacting Output
Compensation / filter Video


Motion
Estimation


Figure 4-5. Deartifacting filter as a post-processing tool in a video encoder.

Input
Video I Transform 3 Quantization

De-Quantization


Inv. Transform
Entropy Bit
-Coding Stream

Deblocking
Intra-frame Filter
Prediction
Deartifacting
Filter
Motion F i Output
Compensation Video



Motion
Estimation


Figure 4-6. Deartifacting filter as an in-loop tool in a video encoder.



4.4.1 Searching Complexity

To obtain the similar set, all positions in a search window are evaluated based

on similarity measure. Let's denote the dimension of a patch as N1 x N2 x N3. The

spatio-temporal search window has the size S1 x S2 x S3. For 2D patches, N3 = 1, S3 = 1.

The patches are partitioned in sliding window fashion with step sizes Nstep1 < N1 and









Nstep2 < N2, respectively. The cardinality of the similar set is limited to Ns. Then the

average complexity of searching for each pixel is

(Ni x N2 x N3+ ) x S x S2 x S3 (47)
NstepiX Nstep2

The first addend in the bracket is the computation for the distance calculation of each

patch. The second one is the cost to retain only Ns patches with smallest distances.

4.4.2 Transform Complexity

To compare the transforms of various dimensions, we limit the transforms to

separable transforms. Let C1D be the complexity of transform for unit length. Hence,

the complexity of all transforms with higher dimension can be written as a function of

C1D. The parameter that changes with dimensions of transforms is N3, which takes value

0 if 2D transforms used;

N3 = 1 if 3D transforms used; (4-8)

preset value if 4D transforms used.

The average transform complexity for each pixel is

NIN2CiD + N1N2CD + N1N2N3 CD + N1N2 N3 NsCI D
-------------(4 9 )
Nstepi1 X step2

The first three terms in the numerator are the transform complexity of a patch. The last

term is the cost of transforming along patches in the set.

From (4-7) and (4-9), we can tell that the complexity increases if we increase the

dimension of patches or transforms. So trade off should be made between the performance

and complexity.

4.5 Performance Comparison

In this section, we present simulations to evaluate the performance of the different

de-artifacting filters under the unified framework. The video sequences are first coded with

the H.264/AVC reference software JM14.1. Then, we apply the de-artifacting filters as

postprocessing to the reconstructed frames. The filtered frames are not used as reference









container earphone
Intra Only IPPP Intra Only IPPP
QP 2D 3D 2D 3D 2D 3D 2D 3D
22 1.24 1.39 0.16 0.29 0.58 0.47 0.22 0.13
27 1.04 1.13 0.08 0.18 0.69 0.55 0.23 0.13
32 0.94 0.98 0.04 0.10 0.73 0.61 0.20 0.11
37 0.84 0.87 0.05 0.08 0.65 0.61 0.18 0.12
Table 4-1. Performance (PSNR gain in dB) of 2D and 3D patches in videos with different
motion characteristics.


frames. The results are compared to those of H.264/AVC deblocking filter. Our test

configurations are Intra only and IPPP using QP = {22, 27, 32, 37} on High Profile.

The test video sequences cover from QCIF to HD (720p) resolutions. In the simulations,

a2 is estimated by the variance of the reconstruction error at encoder. We analyze the

simulation results from three points of view to clarify the advantages and disadvantages of

the features of each algorithm.

4.5.1 2D or 3D patch

An interesting point is the relationship between patch dimensionality and the

de-artifacting filtering performance. Intuitively, a 3D patch should provide more gain than

a 2D patch because it provides more overlapping that may lead to more accurate similar

patch set by introducing the motion feature into the search. However, because of the curse

of dimensionality, a 3D patch may provide even worse results than 2D patch, especially

when motion is complicated. Under our framework, we tested several sequences that have

either smooth or complicated motion. Two typical results are shown in Table 4-1. We

observe that a 3D patch does give much more PSNR improvement with respect to a 2D

patch in slow and smooth motion sequences (e.g., container). However, when motion

becomes complicated (e.g. the jittering in carphone), s 3D patch does not work. A similar

phenomenon is observed in mobile (smooth and slow motion) and bus (complicate and fast

motion).









4.5.2 Transform Dimensionality

As said in Section 4.3, patches in a similar patch set can be packed to generate a

higher dimensional data array. This data array can be easily decomposed by standard

basis since it is near stationary along the packing dimension. In this section, we see

that packing to higher dimension does provide more gain in most of the cases. In this

simulation, we use 2D patch and keep the similar patch set the same for fair comparison.

In one case, we pack the patches in the similar patch set into a 3D data array and use

3D DCT transform. In the other case, we apply 2D DCT to the patches in the similar

patch set individually without packing. Figure 4-7 shows the average PSNR gain of five

typical HD sequences (Crew, Night, Shuttlestart, C.:l, B:1.l,-':ps) for the intra only and

IPPP cases. Based on these results, we find that dimension increment improves a lot the

performance in all tested QP ranges.

4.5.3 Quantization Parameters

Since most of the compression artifacts come from the quantization process, we need

to investigate the impact of quantization to the de-artifacting algorithms. In H.264/AVC,

quantization parameter (QP) is a main controller of the quantization process. In this

section, we find out the relationship between QP and de-artifacting performances by

simulations. In Figure 4-8, 2D local, 3D nonlocal, and 4D nonlocal correspond to the

unified version of algorithms in [63], [34], and [36], respectively. Figure 4-8 shows the

average results of the above five HD sequences coded by intra only and IPPP coding

format. 2D local approach can only obtain minor gain since it only exploits spatial

information.

The two non-local approaches are more sensitive to quantization, since large

quantization incurs in more artifacts that degrade the accuracy of the similar patch set.

In IPPP case, with QP iii. i. -ii- there are more and more blocks that are predicted by

skip mode, which means that the prediction block is exactly the same as the reconstructed

block. As a consequence, these blocks predicted by skip mode are enhancing the artifacts,









because these copies are incorporated into the similar patch set with high probability.

However, in intra only case, there is no skip mode impact, and we can observe a relative

stable performance of the non-local approaches.

4.5.4 Visual Quality

Figure. 4-9 shows detailed crops from coded HD sequences. It is obvious that visual

quality is improved by proposed filter. In details, the edges of buildings is more regulate

than H.264. The 'ringing' effects around the black dots on the wall are suppressed mostly

as well.

4.6 Summary

This chapter reviewed and analyzed three similar sparsity-based denoising algorithms

and applied them to remove compression artifacts. First, we unified the algorithms into

single de-artifacting framework. Then, we compared the de-artifacting performance of

the algorithms considering three aspects: patch dimension, transform dimension, and

quantization parameter. During the comparison, simulations were performed to support

our analysis, which may provide guidelines for applying similar denoising algorithms to

video compression work in the future.









2D vs 3D Transform (intra only)


22 27 32 37


QP
Transform (IPPP)


0.2


0.15


0.1


0.05


0


I 2D Trasnform
3D Transform








V- I


QP


Figure 4-7. Performance of different transform dimensions.


2D vs 3D


0.25

0.2

0.15


0.05









Performance vs. QP (intra only)


22 27 32 37
QP
Performance vs. QP (IPPP)


22 27 32 37
QP
Performance vs. QP (IBBP)


22 27 32 37


Figure 4-8. Performance for different quantization parameters.
88


























































Figure 4-9. Visual comparison of detailed crops. Left column is coded by H.264. Right
column is filtered by proposed filter.



89









CHAPTER 5
SPARSITY ENHANCED VIDEO CODING TECHNIQUES

5.1 Introduction

Video compression techniques originate from image compression by considering

each frame as a single image. Image compression succeeds through exploring spatial

redundancy inside the image. Techniques such as transform, quantization and entropy

coding are key components in the practical codec. In a video sequence, there is additional

redundancy along the time line. Compression efficiency can be improved by reducing

temporal redundancy. Later, correlation among frames was explored through motion

compensation, which improved the performance significantly. The development in

techniques yields the international video compression standards. Two in i-r p1 l ,i- rs

are Motion Picture Experts Group (\! PEG), which developed MPEG-x standards and

Video Coding Experts Group (VCEG) from ITU-T targeting at telecommunication

application of video coding with H.26x standards. C('hi i !. ...ically, H.261 [66], MPEG-1

[67], MPEG-2/H.262 [68], H.263 [69] and MPEG-4 [70] emerge throughout 1990s. In

the early 21st century, MPEG and VCEG formed a Joint Video Team (JVT) to develop

new and powerful video coding standard, which led to the state-of-the-art video coding

standard, H.264 [71] [72] [73]. As a hybrid video coding scheme, H.264 incorporates a

lot of advanced video coding techniques including transform, motion compensation,

quantization and entropy coding. The flowchart in Figure 5-1 shows the process of

a typical H.264 encoder. First, a frame from the video sequence is partitioned to

nonoverlaped blocks of size 16x16, also called a macroblock ( \!I ). Macroblock is the

basic processing unit in a video codec. Second, a predicted version of current MB is

yielded through either intra prediction or motion compensation. The prediction error

which is the difference between current MB and predicted MB is the input of DCT. Third,

the coefficients of DCT are quantized by a quantizer. Finally, quantized coefficients are

encoded by an entropy code and binary bitstream will be the final output.










Input
Video Transform Quantization -

De-Quantization


Inv. Transform
Entropy Bit
4 Coding Stream

Deblocking
Intra-frame Filter
Prediction


Motion Output
Compensation / Video


Motion
Estimation

Figure 5-1. Hybrid video coding diagram.


New features introduced into H.264 over other standards are

Variable block-size motion compensation. A MB is partitioned into 16x16,8x16,16x8,8x8
blocks as shown in Figure. 5-2. Each 8x8 block is further divided into 8x4,4x8 and
4x4 blocks. Block based motion estimation is conducted and block pattern is
determined as the one with minimum prediction error. Large consistent MB tends to
choose large block size. Complicate motion area leads to small block size.

Quater-pel motion estimation. Sub-pel motion precision can recover the true motion
instead of integer pixel motion caused by sampling in time, which yields more
accurate prediction in motion compensation.

Directional intra prediction. For I frames, each block is predicted based on .I11i ient
preceded blocks. The prediction is directional. There are 9 modes for 4x4 blocks
shown in Figure. 5-3 and 4 modes for 16x16 blocks.

Multiple references. In motion estimation, the search window can include frames
before or after current frame, which introduces more candidates for accurate
predictions [74].

In-loop deblocking filter. Block partitioning yields noticeable discontinuous across
block boundaries called blocky artifacts. An adaptive filter is adopted to remove
these artifacts. Filtered frame further serves as reference frame for future frames [61].









* Tow methods of entropy coding, variable-length coding (VLC) and arithmetic
coding, are supported by H.264. Both entropy coding methods use context-based
adaptivity to improve performance over prior standards. They are termed as
context-adaptive VLC (CAVLC) and context-adaptive binary arithmetic coding
(CABAC), respectively.

* Network friendly data stream. To adjust to various network bandwidth, H.264 uses
Network Abstract L .,--r (NAL) to make the bitstream apparent to network so that
transmission is easy.


16x16


16x8


8x16


MB Types


8x8


8x4


4x8


8x8

0 1

2 3


4x4

0 1

2 3


8x8 Types


Figure 5-2. Partition of MB.


When evaluating the performance of a video codec, we can not simply compare the

single measurement such as Peak Signal to Noise Ratio (PSNR) and bit rate. It is more

reasonable to consider them both at the same time, which is also what rate-distortion

theory about is. If higher PSNR is desired, more bits are needed. All the parameters of

coding techniques are determined in a rate-distortion optimized (RDO) fashion, which

means tradeoff is made between PSNR and bit rate.


(5-1)


ME M-1 N-1
MSE = i, j)- (i,j) 2
i=0 j=0


where M is the height of video, N is the width of video, I is the original signal and I is the

reconstructed signal.


2552
PSNR = 10 log1lo M
MSE


(5-2)














8



>1



6



3 4

7 5

Figure 5-3. Intra prediction directions for 4x4 blocks.

In the scheme of H.264, only prediction errors are fed to transform and following

procedures for I, P and B frames. From the perspective of sparsity, most new techniques

contributing to the success of H.264 yield sparse representation in the transform domain.

It is promising in improving performance if more sparse coefficients are obtained from new

techniques.

5.2 Adaptive Block Pattern

5.2.1 Heterogeneous Block Pattern

In H.264/AVC standard [73], various block size only appears in motion compensation,

which means the intra coded frame (I-frame) only has 16x16, four 8x8 blocks or 16 4x4

blocks for each macroblock as shown in Figure 5-4A. For MBs with flat content, 16x16

block size is used; For MBs with details, 4x4 block size is used. However, it is not wise to

use uniform pattern for the whole macroblock (\!1 ). For each MB, the homogeneous block

pattern can not capture the variation of content adaptively. We propose a heterogeneous









block pattern for I frames, which can adapt to the local content structures within a

MB. The final block pattern forms a quadtree like structure. Figure 5-4 compares the

final block patterns of the same frame encoded by H.264 and proposed algorithm. The

results from proposed algorithm shown in Figure 5-4B demonstrate better adaptivity in

content structures. For flat areas, large block size (8x8 or 16x16) is chosen; For details,

4x4 blocks are used. For MBs covering objects boundaries, we can observe complicated

block patterns.

5.2.2 Implementation Details

Our implementation is based on JM14.1 [75], the official reference software for H.264.

We implement proposed quadtree structure pattern as additional mode to the existing

modes in H.264. To distinguish from existing modes, we need some flag and system, which

is called an overhead. We provide a simple method to encode the overhead. We add one

flag to indicate the adaptive mode as well as several bits to represent the block pattern in

the MB. The mode decision is conducted in a rate-distortion optimization fashion. The

overhead will be inserted in the MB l-v- r header in the bitstream. Syntax changes are

listed in Table 5-1.
Table 5-1. Syntax added to MB header.
variable name bits Descriptor
flag_QT 1 flag
QT_pattern 4 4 bits pattern, 1 bit for each 8x8


This method may not be optimal. The improvement of encoding overhead will be the

next step.

5.3 Enhanced Intra Prediction

5.3.1 Intra Similar-block Search

In H.264/AVC standard, intra prediction improves the RD performance significantly

and even achieves better performance over still image compression standard JPEG2000

[13]. Using boundary pixels from .,I.i ient blocks, the current block is predicted based

on designed directional methods. There are 9 directions for 4x4 blocks and 4 directions























































Figure 5-4. Block pattern comparison. (A) Homogeneous block pattern in H.264. (B)
Proposed heterogeneous block pattern.






95









for 16x16 blocks. The intra prediction in H.264 is actually model based linear prediction,

which works well when the signal is smooth. When the signal changes fast, the prediction

error will go large. Of course, the various block sizes may compensate the drawbacks

by using smaller block size to adapt to local structure. However, there is still large high

frequency components in prediction error.


Figure 5-5. Example of similar MBs. Two MBs indicated by block rectangles are very
similar to each other.


An alternative way to avoid that is to use data dependant prediction. In fact, there

exist similarities inside the same frame as shown in Figure. 5-5. We may benefit if we use

similar blocks to predict the current block. We can search for similar blocks in the frame,

like motion estimation among frames. Block based motion compensation is only used to

remove the redundancy in time domain. P and B frames search over reference frames to

get a matched block as prediction. Only the residual between original data and prediction









is fed to further processing. We can not call it as motion compensation as there is no true

motion inside a single frame. The idea is to utilize the self similarity inside one image.

5.3.2 Implementation Details

Our implementation is based on JM14.1, the official reference software for H.264.

Similar search is conducted for each block. The search window is limited to pre-coded

blocks only due to the causality. (Since we process one block after another, it is

impossible to predict using blocks in the future.) The similarity measure we used in

our implementation is Sum of Absolute Difference (SAD) between original block and

prediction, which is easier and faster to calculate. We can also use Sum of Squared

Difference (SSD) or other measures. The similar search will return two displacements

between the position of current block and that of matched block with minimal similarity

measure. The displacements need to be sent to decoder side as well as a flag indicating the

new prediction mode.

To encode the overhead, here is a simple method we implemented. First, we extend

the intra prediction mode with an additional mode. If the block is chosen as enhanced

prediction mode, displacements will be followed. This method is not the optimal design

for encoding the overhead. The overhead will be inserted in the MB lv--r header in the

bitstream. Syntax changes are listed in Table 5-2:

Table 5-2. Syntax added to MB header.
variable name bits Descriptor
flagEn 1 flag
disphori variable horizontal displacement
dispvert variable vertical displacement


5.4 Experimental Results

Experiments are conducted to compare the performance between proposed algorithms

and H.264. Our test configurations are Intra only using QP = {22, 27, 32, 37} on High

Profile with other settings shown in Table. 5-3. The test video sequences cover from QCIF

to HD(720p) (Crew,Night, S',,l.i'' start, Ch:l; Big ships) resolutions. The average PSNR









and bit-rate is calculated for each setting [76]. The experiment settings are listed as follow.


Table 5-3. Encoding configuration
Variable Value
Frames coded 30
Profile High
QP 22,27,32,37
Entropy coding CABAC
RD optimization enable


5.4.1 Heterogeneous Block Pattern

To demonstrate the upper bound of the performance of the proposed algorithm, we

implement the algorithm without introducing overhead for the adaptive block pattern

while we use the optimal pattern in coding the frame. The RD plots are shown in

Figure. 5-6. There is an improvement 0.7 dB achieved by the adaptive block pattern.

Figure. 5-7 shows the performance of proposed algorithm and H.264. Proposed algorithm

has 0.1 dB gain over H.264. The overhead leads to the degrading of gap from Figure. 5-6.

5.4.2 Enhanced Intra Prediction

To demonstrate the upper bound of the performance of the proposed algorithm, we

implement the algorithm without introducing overhead for the displacements. The RD

plots are shown in Figure. 5-8. There is an improvement 0.6 dB achieved by enhanced

prediction.

Figure. 5-9 shows the performance of proposed algorithm and H.264. Proposed

algorithm has 0.1 dB gain over H.264. The overhead leads to the degrading of gap from

Figure. 5-8.

5.4.3 Combination of Algorithms

In this experiment, we combine the adaptive block pattern and enhanced intra

prediction together to evaluate the performance. The first experiment aims to find out the

upper bound of the combination. The performance comparison is shown in Figure. 5-10.

There is 1.3 dB PSNR gain over H.264. Results show that these two algorithms do not










foreman


rO 38
Z
CO

36


34
H.264
-- Proposed w/o overhead
32 -
500 1000 1500 2000 2500 3000 3500 4000
Bit-rate(kbps)


Figure 5-6. RD plots of H.264 and proposed algorithm without overhead.


conflict with each other. Adaptive block pattern and enhanced intra prediction indeed

both improve the RD performance. Figure. 5-11 shows RD plots with overheads in

bitstream. The combination of two algorithms will introduce two overheads. That is why

the gain over H.264 is smaller than 0.1 dB.

5.5 Summary

We proposed two algorithms to improve the sparsity of prediction error in video

coding. Adaptive block pattern yields quadtree-like heterogeneous blocks in a MB which

captures the local structures. Enhanced intra prediction explores similarity inside a frame

and use data dependant nonparametric prediction to achieve more accurate prediction.

Experimental results demonstrate gain over H.264.


























foreman


- H.264
S-*- Proposed with overhead


1000 1500 2000 2500
Bit-rate(kbps)


3000 3500 4000


Figure 5-7. RD plots of H.264 and proposed algorithm with overhead.


32
500



























foreman


O 38
Z


36



34
--- H.264
-- Proposed w/o overhead
3 2 I I I I I
500 1000 1500 2000 2500 3000 3500 4000
Bit-rate(kbps)


Figure 5-8. RD plots of H.264 and proposed algorithm without overhead.



























foreman


rO 38
CO


36



34
S--- H.264
-- Proposed with overhead
32------
500 1000 1500 2000 2500 3000 3500 4000
Bit-rate(kbps)


Figure 5-9. RD plots of H.264 and proposed algorithm with overhead.
































foreman


- H.264
S-*- Proposed w/o overhead


500 1000 1500 2000 2500 3000 3500 4000
Bit-rate(kbps)


Figure 5-10. RD plots of H.264 and proposed algorithm without overhead.


I I I


I I I


I I I I




























foreman
44



42



40


r5
r 38
z
CO


36



34
E-- H.264
S-*- Proposed with overhead
32 1 1 I I I
500 1000 1500 2000 2500 3000 3500 4000
Bit-rate(kbps)


Figure 5-11. RD plots of H.264 and proposed algorithm with overhead.









CHAPTER 6
SPARSITY BASED RATE CONTROL IN VIDEO CODING

6.1 Introduction

Rate-distortion theory serves as the theoretical foundation of lossy data compression

[77] [78] [79] [80] and rate control in video coding. It addresses the problem of determining

minimum sources requirement given a tolerable distortion. It provides the theoretical

bound for practical problems.

X Y
encoder decoder *


Figure 6-1. Data compression system diagram.


In rate-distortion theory, rate denoted by R is defined as the number of bits used

for per data sample. Distortion measure usually uses mean square error (\! l;). A

distortion function d(., -) maps pairs of original and reconstructed sources (X, X) to a set

of non-negative real numbers.

d : Xx X R (61)

In a system described by Figure 6-1, we can obtain a set of pairs of (R, D) by tuning

the parameters of encoder. Given source X, a rate-distortion pair (R, D) is said to be

achievable if there exists a function mapping X -i Y with rate R and d(X, X) <

D. The closure of the set of achievable rate-distortion pairs is called rate-distortion

region. Two functions can be derived based on the definition of rate-distortion region.

One is called rate-distortion function R(D) which is the infimum of rates R such that

(R, D) is in the rate-distortion region of the source for a given distortion D. The other is

distortion-rate function D(R) which is the infimum of distortions D such that (R, D) is in

the rate-distortion region of the source for a given rate R.

Figure 6-2 demonstrates the rate-distortion region for a Gaussian source. The gray

area denotes the achievable rate-distortion region. No compression system performs

outside the gray area. The rate-distortion function at boundary of the rate-distortion










region is the lower bound. The closer a practical system is to the bound, the better it

performs.


3.5


3


2.5


2

rI


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
D/o2


Figure 6-2. Rate-distortion function for
region is the gray area.


a Gaussian source. Achievable rate-distortion


For quadratic distortion, only a Gaussian source has a closed form rate-distortion

function. For other sources with Laplacian and generalized Gaussian distributions, the

rate-distortion functions can be found through numerical optimization algorithms [81].

The rate distortion function specifies the minimum rate required for a given tolerable

expected distortion. The distortion rate function specifies the minimum distortion

achievable under a given rate budget. There is alv--- a trade-off between the distortion

and the rate. If we extend the scenario to multiple random variables/sources, we can

formulate the problem as


N N
min D(Ri) s.t. R < Rt
=1 ii- 1


(6-2)









where N is the number of random variables, Di(R) is the distortion rate function of the

ith random variable (i = 1, N), and Rt is the target budget of bits for the N random

variables.

Video compression problem can be formulated as Eq. (6-2). Different sources are

usually frames in the video sequence. In video coding, two typical applications, DVD

and IPTV, correspond to the two problems respectively. In DVD application, the viewers

expect good and consistent visual quality across frames in the whole movie. The rate

for DVD applications is usually at 3 Mb/s and up. In IPTV application, users share the

bandwidth of Internet. The bandwidth for each user is small, thus video streams have

limited rates. The encoder has strict constraints on the rate, which is typically at the

order of 300 kb/s; otherwise, users in the same networks will be affected. IPTV usually

presents large variations in quality across frames due to the strict limitation in rates.

To address the new problem, we introduce constraints on distortion fluctuations.

N N
min Di(Ri) s.t. Ri < Rt and |AD < (i = 1, N 1), (63)
{Ri,}' 1 i 1I i=1

where ADi = D;+ Di (i = 1, N 1), and 7D is the maximal tolerable fluctuation in

distortion.

Taking the constraint on distortion fluctuations into account, we explore the

relationship between the rate and the change of distortion and derive a series of theorems,

which characterize an Gamma Rate Theory of rate-distortion, i.e., reducing (in, i.

respectively) the rate will result in increased (reduced, respectively) distortion fluctuation,

or vice versa. The gamma rate theory tells that there is a fundamental tradeoff between

the rate and the distortion fluctuation. We will also use experiments to validate the

gamma rate theory.

The remainder of the chapter is organized as follows. Section 6.2 presents the gamma

rate theory. In Section 6.3, we apply the gamma rate theory to the design of sparsity









based rate control algorithms where various rate-distortion models are used. Section 6.4

summarizes this chapter.

6.2 Gamma Rate Theory

In this section, we present theoretical analysis of the rate-distortion problem. Based

on the assumption of Gaussian sources, we first present an optimal solution without new

constraints. Then a theory is proposed to analyze the relationship between rate and

distortion fluctuation.

6.2.1 Reverse Water-filling

If the source is random variable drawn from i.i.d. Gaussian distribution N(0, u2),

based on rate distortion theory, the minimal rate needed to achieve given distortion D is


Silog ", 0< D < 2
R(D) = 2 D (6-4)
0, D > a2

Equivalently, we can obtain the distortion rate function


D(R) = a22-2R (6-5)


Denote {Xi}, as a zero-mean Gaussian process, with E[X2] = a2 (Vi). To find the

rate-distortion function for the process, the problem can be formulated as below

N N
min R,(D,) s.t. Di < Dmax (6-6)
(D,}N'1 i-i i-i

The solution to (6-6) is [80]

N 2
R(D)= ilog (6-7)
i-i

where Di = min(a2, A) and A satisfies >i Di < Dmax. (6-7) is called reverse water-filling

solution.









U 05
04
07
01

















X1 X2 X3 X4 X5 X6 X7

Figure 6-3. Reverse water-filling for 7 Gaussian sources


In Figure 6-3, we demonstrate the solution Eq. (6-7) for the case of 7 sources. The

dashed parts are the distortion for each source. It can be observed from Figure 6-3 that

the maximum distortion is the variance of the source.

Reverse water-filling strategy is the optimal solution for multiple sources. Reverse

water-filling strategy is non-causal, i.e., the current decision uses information about future

input. When determining the rate for the ith frame, the algorithm has to provide A which

guarantees >i Di < Dmax. The non-causal property is impossible for real time applications

such as video conference. Even for applications like DVD, this algorithm will introduce

extremely high complexity to gather information from all frames.

In practical systems, resources like bandwidth, d. 1iv are alhv-- easy to describe than

distortion. Real problems are usually formulated as (6-2). Moreover, a causal control

strategy is albv--i preferred. The gamma rate theory is intended to characterize the

tradeoff of rate fluctuation and distortion fluctuation under a causal control strategy.



109









6.2.2 Gamma Rate Theory

In this section, we provide theoretical analysis of solution to (6-3). We first define a

few concepts as below.

Definition 1. {X,;}}, is said to be an independent Gaussian sequence if Xi (i = 1, N)

are independent Gaussian random variables and X, has fixed variance ao (i = 1, N).

Definition 2. A rate allocation str i t1, R (-' ....:I' by {Ri}i 1) for an independent

Gaussian sequence {Xi}iN is said to be R-D optimal if the resulting distortion for Xi is

Di(Ri) (i = 1, N) where Ri > 0 (i = 1, N).

Definition 3. A rate allocation strl, i1,1 R ('i" .:' by {Ri}N,) for an independent

Gaussian sequence {Xi},1 is said to be causal under rate constraint R if
n
Ri < nR, n=l, --- N,
i-i

Ri > 0, i = 1, -- N.

Definition 4. An independent Gaussian sequence {X;i} is said to be controllable w.r.t.

R and TD (R > 0 and 7D > 0) if there exists a causal R-D optimal rate allocation str iI ,'I,

R such that |AD (7R) < 7y (Vi), where


ADi(R) = Di+,(Ri+) Di(Ri), i = 1, ,N 1,
n
Ri < nR, n=l, --- N,
i=1
Ri > 0, i = I, -- N.

Definition 5. (R, 7D) is said to be controllable w.r.t. an independent Gaussian sequence

{X;i}N if {Xi}iN is controllable w.r.t. 7D and R.
Definition 6. For an independent Gaussian sequence {Xi},1 its controllable region

in R- D plane is the closure of the set of all controllable (R, 7D) pairs. Denote the

controllable region by C{x,}.















3
max
2.5

2

S1.5

1

0.5

0

-0.5

-1
0.5 1 1.5 2 2.5 3 3.5
R


Figure 6-4. Controllable region in R-7D plane.


Definition 7. The i.1,1,,ir rate function F(R) of {Xi}i 1 is the infimum of all 7D such

that (R, 7D) is in the controllable region of {Xi}," for a given R.

Definition 8. The rate .ia,,,,,,.r function R(7D) of {Xi}i 1 is the infimum of all R such

that (R, 7D) is in the controllable region of {Xi}",1 for a given 7D.

Remark: Just like a distortion rate function, there exists a gamma rate function that

serves as the boundary of the controllable region of {Xi}i as shown in Figure 6-4.

Theorem 6.1. Given an independent Gaussian sequence {Xi},J with variances {u7}1,,

denote the minimum of {a}, by ain = minie(i,...,N} o The g.r,,i., rate function F(R)

has the following properties:

1) (R) > 0.

2) F(R) is a decreasing function for R E [0, R), where


S= min R; (6-8)
RE{R:F(R)=O}










1 /F(nJ" (69
= max log2 v11 2 (6-9)
n=l,,N 2 min
) F(R) = 0 for R [R, oo).

Proof. F(R) is defined by

F(R)= max IDi+i(Ri+1) Di(Ri)| (6-10)
i{1,--- ,N 1-}

1) From (6-10), it is obvious that F(R) > 0.
2) Now we prove that F(R) is a decreasing function for R C [0, R), where R =

minRe{R:r(R)=o} R.
For two arbitrary points R1 and R2 in [0, R). Without loss of generality, assume
RI < R2. In case there is only one source (- w, source k) achieving the maximum value of
ADi, i.e., k = arg maxi;e{,...,N-_ } AD;l, we let R1 = R1 + e, where c > 0 and C is so small
that c bits can only be used to reduce |ADk|, while the resulting ADbk| under R1 is still
the maximum among {I AD } under Ri. So F(R1) = ADkl since | A k is the maximum
among { Abil}; and |ADkl < IADkl since lADkl is reduced to IDk due to C bits. Since

F(Ri) = IADkl > lADbk = F(Ri), we have F(RI) > F(Ri).
In case there are L sources achieving the maximum value of ADi, i.e., kj =
arg maxi;{,....,N- } ADi; (j = 1,. L), we let R = R + e, where e > 0 and e is so
small that there exist {c}j l (E i cj = e) and a bit allocation strategy that guarantee
that Cj bits (j = 1, .. L) can only be used to reduce lADk while the resulting lADk,
under R1 is still the maximum among |ADil under Ri; in other words, the set of sources
achieving the maximum value, i.e., I = {i* : i* = arg maxi;e(,...,N-1} Dil|}, contains the L
sources (sources {k}jL1) and the cardinality I1I > L; in other words, there could be new
sources achieving the maximum value. Since F(R1) = lAD, > lAD, = r(Fi), we have

F(RI) > F(R1).









Following the same proof for F(RI) > F(R/), for a sufficiently large M, there exist

{Rif"2 and {Fi such that F(R1) > F(R) > r(R2) > ... > r(Rm) > F(R2), where
R/i+ = Ri + Ei (i = 1, ... M 1) and R2 = RM + EM-
Since for two arbitrary points RP and R2 in [0, R) and Ri < R2, we have F(R1) >

F(R2), hence F(R) is a decreasing function in [0, R).

3) Now we prove that R = maxni,...-,N log2 and F(R) = 0, where
R = minRE{R:r(R)o0} R. There are two steps to finish the proof. In the first step,
we prove there exists R such that F(R) = 0. In the second step, we prove that

R = maxn 1,-,N log2 where R = minRe{R:F(R)=o} R and F(P) = 0.
1 10-2 Then we have
Firstly, let R = man 1,-... ,Nv o~ Then we have

n 1
R > 1og2 2 n={1,... ,N} (6-11)
2 0 min

Then

2,,m 2-2R n = {1, N} (6-12)

So there exists a D such that


in >D > 2-2R n i n={1,... ,N} (6 13)
i 1

Therefore, we have
1 ln 02
1log2 1 < nR, n = {1,-.. ,N} (6-14)
2 Dn
Then we have

E(2log2 D) i-i
2N
Then we can find a rate allocation strategy R with {Ri = log 2 1 This strategy

is a causal and R-D optimal strategy verified by Eq. (6-15). {R, = i 1og02 Ni implies
consistent distortion for all sources, which is Di = D, i E {1, N}. Then ADi(R) =









0, i E {1, N 1}. So we have F(R) = maxi=1,...,Nl iADi(R)| = 0. So there exists R

such that F(R) = 0. Here R = maxn i,.-,N log, V2
'm7in o_
1~'- where52,1,1 R = minRE{R:F(R)=0}R
Secondly, we prove that R = maxn=1,... log where miR{R(R)}

and F(P) = 0. We prove this by contradiction.

Assume P < maXn,=1.,N 10og2 Then there must exist a k E {1, N} such

that
k /k
R < 1og2 (6-16)
2
k
2 0.rin

S2 <2-2 k 2 (6-17)
imn

Since F(P) = 0, then there must exist a causal rate allocation strategy R with

rates {Ri}I 1 and distortions {D },I for P such that max/ 1,...,N-1 |AD(7Z)| = 0. Then

|AD|i = 0. So Di+1 Dil = 0, i = 1, N 1 and denote D = Di, i = {1, J N}.

The causal strategy R will have


(10og2-) < nR n=-{1,- ,N} (6-18)
i=2
Rate distortion function of Gaussian sources implies that


Di,
Then we have

b< 72 i={1,- N} (6-20)

Then we have

-D m in (6-21)

Combining Eq. (6-18) and Eq. (6-21), we have


in> D> 2-2R i2, n ={1, N} (622)
i 1









Eq. (6-22) conflicts with Eq. (6-17), so the assumption of R < maxn 1,...,N log2
1cin
is not correct. Then we must have R > maxn i,... ,Nv og2 V Since we have proven

that F(R) = 0 for R = maxn=,...N log2 Vj i and R = minRe{R:F(R)=o} R, we have

R = maxn=1,..-,N 2 log2 -2

4) Finally, we prove F(R) = 0 for R e [R, oc).

Assume a causal rate allocation strategy R for R with rates { Rji}i and distortions

{Di} iN, where D = Di i {1,. N}. For an arbitrary R* = R + e where c > 0,

we can find a rate allocation strategy R* with rates {R*},) and distortions {D },"i.

When changing from R to R* we add c bits to each source. That is R* = Ri + e for

i {1, N}.

Thus

RR;= (R + ) = R + n (6 23)
i-1 i-i i-1
Since Ri < nR, we have
n
Y R* < n(R + e) = nR* (6-24)
i-i 1
So R* is causal and R-D optimal rate allocation strategy.

The distortion of R* are


D* = o2 2-2Ri = Oa22-2(R,+c) = 2-2(Di (6-25)


Since D= Di i {1, },wehave


D* = 2-2bD, i {1, ..., N} (6-26)


Then

ADi(7*) = Di Di = O, i {1, N 1} (6-27)

So

F(R*)= max |ADi(*)l = 0 (6-28)
iE{ ,---... ,N 1









Since F(R*) = 0 for an arbitrary R* > R, we proved F(R) = 0 in R e (R, oc).

D

Corollary 1. F(R) is a non-increasing function in [0, oc).

Proof. According to the properties 2) and 4) in Theorem 6.1, F(R) is a non-increasing

function in [0, oc). Specifically, F(R) is a decreasing function for R E [0, R), where

F(R) = 0; and F(R) = 0 for R e [R, oc). E

Corollary 2. The maximum of F(R) is achieved at R = 0.

Proof. According to the properties 2) and 4) in Theorem 6.1, F(R) is a non-increasing

function in [0, oc). So F(R) < [(0), R E [0, oc). Then [(0) = maxRe[o,,) F(R). D

In a real codec, the achievable number of bits is not continuous but discrete. There

might be control errors between the target rate and the real rate after encoding. Then

Theorem 6.1 will be modified with errors in rate.

Theorem 6.2. An independent Gaussian sequence {Xi}iN is encoded subject to causal

rate constraints 1 _1 R, < R (Vn E {1, 2, .. N}) and there are random values

{En, n = 1, 2, N} such that R1 + E1 < R and R, + En < nR y-11 R (Vn e

{2, N N}). {En, n = 1, 2, N N} are n ,,,:..i ,,l i distributed between -Emax and Emax,

where 0 < Emax < R. The following properties still hold.

1) F(R) > 0.

2) F(R) is a decreasing function for R E [0, R), where R = minRE{R:r(R)=o} R

3) F(R) = 0 for R E [R, oc).

Proof. To prove the gamma rate theory with control errors, we try to convert the control

errors to the variation of sources. Given {XJ;}, and {E;, i = 1, 2, N}, there exists an

equivalent source {X1}i1 with variances {r2}, which satisfies

1 2a ,-2
R, + Ei = log2 i = 1, (629)
2 D









For a fixed distortion D, the equivalent source has relationship with the original source as


a; = 2E'ca (6-30)


Then the proof reduces to the proof of Theorem 6.1 with modified sources {Xi}i As

long as the original source {Xi}i, is controllable, Theorem 6.2 is correct. O

Buffering does not change the gamma rate function.

In the practical video systems, there is ahv-l a buffer available for rate control

algorithms. The buffer enables the current frame to borrow bit budget from future frames.

Intuitively, we may think that the buffer will improve the performance of F(R). However,

a buffer does not help here.

Theorem 6.3. An independent Gaussian sequence {Xi}iN, is encoded by a R-D optimal

rate allocation str l' in' {Ri);} with a buffer, which is Y' i Ri < nR + B, where B is the

buffer size and n E {1, N}. The buffer will not I1,y,. F(R).

Proof. We rewrite the strategy as
n B
SRi < n(R + ) ne {1,--,N} (6-31)
i-=

When n -i o, B -- 0. The strategy with buffer reduces to a bufferless causal strategy

.,-vi~,l". ically. Then F(R) of the strategy with buffers stays the same as the bufferless

strategy. O

Theorem 6.1 provides a fundamental tradeoff between user friendliness and network

friendliness. User friendliness means small distortion fluctuation described by the

maximum of |ADjl; network friendliness means small data rate, since small data rate

will consume small bandwidth and yield small congestion in the network.

Theorem 6.1 also implies that a rate control algorithm will suffer large distortion

fluctuation if the rate is reduced. In other words, if a rate control algorithm achieves high









average PSNR, it will save bits equivalently. Then the reducing of rate will lead to large

F(R).

The gamma rate function F(R) provides the theoretical bounds for the maximum of

|ADi; given a target rate in rate control algorithms. Just like the rate distortion function,

the gamma rate function can also serve as the benchmark for all rate control algorithms.

To evaluate a rate control algorithm, we should combine the gamma rate function

and the rate distortion function. In the triplet (R, D, 7D), one of them should be fixed to

compare the others. For instance, the algorithm with better R-D performance is better

with fixed 7D. In next section, we will present a sparsity based rate control algorithm that

achieves better R-D performance with fixed 7D than H.264.

6.2.3 Simulation Results

To support the gamma rate theory, we demonstrate the non-increasing function

F(R) described in Theorem 6.1 by simulations in Matlab. The test data is Gaussian

sources {Xi}NI with zero means and variances a2, i = 1,..., N. In the simulation, we

choose N = 100. The plot of F(R) is shown in Figure 6-5. The F(R) function is indeed a

non-increasing function in [0, oo). When R is sufficient large, F(R) = 0.

6.3 Sparsity Based Rate Control

6.3.1 Rate Control in Video Coding

In practical video applications, rate control is a very important component. Rate

control module guarantees that the coded bitstream meets the target rate. In the

meanwhile, the total distortion is expected to be minimized. This can be formulated

as

min Di s.t. Ri < Rt (6-32)
i i
where Rt is the target rate and Ri and Di denote the rate and distortion for the ith frame.

The solution to (6-32) will be the optimal bit allocation among frames. However,

it is difficult to find a closed form rate-distortion function for a practical video codec.

The reason is that the source is not Gaussian distributed and the coding scheme is not





















4-

3-

2-
1-

0-

-1
0 50 100 150 200 250 300 350 400
R


Figure 6-5. F(R) function.


optimal. Another problem is that the number of bits for each frame including bits for

residual and bits for motion vectors and side information(e.g. mode, type, etc), which

are difficult to model. The closed form solution can not be determined as Eq. (6-7). In

literature, various models are introduced to approximate the rate distortion function of a

practical video coding system. For example, p domain model [82], [83], linear model [84]

and quadratic model [85] are proposed to model the functions between coding parameters

and rate and distortion. Model parameters are estimated from video data. The accuracy

of parameter estimation depends on the information gathered. There are two 1i i i"

types of strategies. One type is two pass or multiple pass, which is a non-causal solution

to (6-32). The video sequence is first encoded to collect statistical information, which

are used to estimate model parameters. Coding parameters for the whole sequence are

determined by solving (6-32) using the estimated model. Several passes may be taken to

reach a stable estimation. In the last pass, the video sequence is re-encoded using coding

parameters determined by optimal solution to (6-32). The problem with this approach is









the extremely high complexity and long delay. The other is single pass, which is a causal

solution to (6-32). The video sequence is encoded only once. All the model parameters

are estimated based on up-to-date data. The coding parameters are determined frame by

frame. For applications with constraints on delay and complexity, single-pass strategy is

preferred. Single-pass strategy has poorer performance than two-pass or multiple pass due

to the lack of prior knowledge for future frames. If we can have additional information

about future frames, the performance is expected to be improved.

For single-pass rate control, we proposed a sparsity based algorithm to improve the

performance. In H.264, a MB is said to be SKIP mode if all transform coefficients are

quantized to zeros. Denote Sn = [sl, ..., sM] where si = 0, if the ith MB has SKIP mode;

otherwise, si = 1. Let p' = S-. We can find that p' is similar to the definition of p

in [82]. The difference is that p' denotes zero ratio in MB level and across frames, and

p denotes zero ratio in pixel level and inside one frame. Frames in a video sequence have

high temporal redundancies. Then we find that vector Sn is sparse, especially when the bit

rate is low. In other words, two .,.i ,i:ent frames are very similar. The similarity can be

described by p'. It is reasonable to -: that two frames with a large p' are much more

similar than those with a small pT. The similarity can help to model the rate distortion

function of the real codec accurately. By sparsity based rate control, we mean that in

this algorithm, we leverage the high percentage of MBs with SKIP mode. With accurate

models, we can find a practical closed form solution to (6-32).

Once the bit allocation is determined by solving (6-32), the next step is to map the

target rate to coding parameters (i.e. Quantization Parameter (QP) in H.264). The H.264

standard has introduced rate-distortion optimized (RDO) motion compensation with

multiple reference frames and mode decision among various intra- and inter-prediction

modes. These new features have improved the coding efficiency significantly. However,

RDO make the rate control more complicated. For instance, rate control algorithm needs

prediction residual information to determine QP. However, prediction residual can not be









determined by RDO without knowing QP. This leads to a 'chicken and egg' dilemma [86],

which makes the rate control in H.264 more challenging than previous standard. Several

rate control algorithms are proposed to solve the problem [87] [88] [89] [90] [91] [92].

6.3.2 Rate Control without Constraints on Distortion Fluctuation

We first derive the solutions for (6-32) based on different rate-distortion models.

Then the constraint on distortion fluctuations is introduced and a new optimal solution is

derived for the new constraint problem.

6.3.2.1 Algorithm framework

To derive the sparsity based algorithm, we first start with a scenario with two frames.

Suppose a video codec has finished encoding of the (n 1)th frame. Let Rt denote the

total budget for the nth and (n + 1)th frames. The rate distribution for the following two

frames is qrRt for the nth frame and (1 qr)Rt for the (n + 1)th frame, where rI E [0, 1] is

the distribution parameter. Then the problem (6-32) is reduced to


min D,(qrRt) + Dn+((1 qr)Rt) (6-33)
'17

where D(R) is the rate-distortion function.

We can separate the frame into SKIP region and non-SKIP region. As mentioned,

SKIP mode will not consume any bits from the budget. So only a portion of budget will

encode residual data. Then the distortion of the nth frame is


D-= pD +(1- p)D (6-34)


where p, is the percentage of SKIP MBs, DS is the average distortion of SKIP region

and DN is the average distortion of non-SKIP region. The reconstructed pixel values in

SKIP regions are exactly the same as previous frame. Considering the high similarity of

.,ii i:ent frames, we assume that DS + Dn and Dn is independent of T]. Then the goal is









to minimize the total distortion of two frames


Pn n n nDn J nn/lPn n n n(
minD = D + D+ = p[D. + (1 p)D + p (p[D + (1 p ) + (1 p )D%

(6-35)

A closed form solution will be available if we knew the closed form of rate-distortion

function in (6-35). The algorithm can be easily extended to multiple frames by assuming

all future frames after (n + 1)th frame consume the same rate as the (n + 1)th frame, since

the even distribution is a suboptimal approach without priors. Different rate-distortion

model will yield different solution for (6-35). In typical video coding systems, prediction

residuals are usually modeled as Gaussian, Laplacian [93] and Cauchy distributions [94].

We will derive solutions based on various rate-distortion models.

Gaussian Model.

If the source has Gaussian distribution, we have rate-distortion function as


D = a2 2-2R (636)


where a2 is the variance of the source and D is defined as mean square error (M\!S1 ). For

the nth and (n + 1)th frames, all the bits are used for the encoding of non-SKIP regions.

So we have

DN = a,2 2-2,7Rt (6 37)

DN+= a+2 2-2(1-Rq)Rt (638)

where a and a 2+ are the variances of the non-SKIP region of the nth and (n + 1)th

frame, respectively. Then Eq. (6-35) is


qn n n n n+12n2n n n
minD = p)Dn+(1 -p-)n2-27Rtn+(pT D + (1- p)2-27Rt) Rt)

+(1- pT 1)2 2-2(1- )Rt (6 39)
ln +ln+(639)









Set OD/9l = 0, we can find the optimal solution l*.

1 1 (1 p)(1 + pn+i)
= + 1 [Iog2( 1 ) + Iog2( 2)] (6-40)
2 4Rt 1 Pn+1 n+I

Laplacian Model.

If the source has Laplacian distribution with pdf p(x) = Ae-Alxl we have rate-distortion

function as [79] [95]
2-R
D = (6-41)
A
where A is the Laplacian parameter and D is defined as mean absolute difference (I\AD).

A can be estimated by A = 1 where p is the mean. Or by


A2
-2 = (642)


So we have the distortions for two frames

2-qrRt
DN (6-43)
"An

2-(i-,)Rt
DN+ --- An (6-44)
An+1
where An and An+1 are the Laplacian parameters of the non-SKIP region of the nth and

(n + 1)th frame, respectively.

Although MSE is used for distortion measure, it is reasonable to approximate MSE by

MAD. Then the problem can be solved by


min Dn(rlRt) + Dn+l((1 r)Rt) (6-45)
'7

Set OD/9Or = 0, we can find the optimal solution

_1 1-- (1 PT)(1 + PT+) An
S1 + [log( 1 ) -log( )] (646)
2 2Rt 1 Pn+i An+1

Cauchy model.









In real video codec [94] [96], Cauchy density is used to approximate the DCT
coefficient distribution of H.264. A zeromean Cauchy distribution has the pdf


p(x xER (6-47)
7(p2 + X2)

where p is the parameter of Cauchy distribution. The entropy function as a function of

quantization stepsize Q for a Cauchy distribution is [94]

2 Q 2 Q
H(Q) = arctan( 2( arctan())
7 2/1 7 2/1
2 1 1
-arctan( 1)Q ) og2(arctan( (41)) (648)
i- Q 4fp Q 4fp
The distortion function of a Cauchy distribution is

SQ 1i Q P2 + (i + 1/2)2 2
D(Q) = 2'IQ In( '2 'n)
i 7T 7 2 + (i- 1/2)2Q2
Sp_2 /2 C2
arctan( )
-2 arctan(2 +(i2 1/4)Q2
i 1

+ Q 2 2 arctan( ) (6 49)
7T 7T 2/

The rate-distortion of Cauchy source can be derived based on Eqs. (6-48) and (6-49).

However, the formula will be very complicated and not suitable for practical usage. The

rate-distortion model is approximated by a simpler formula

D = a Rb (6-50)

where a and b are parameters estimated from video data. These parameters can be

estimated in frame level and this model can provide more accurate estimation for

distortion. When we take a logarithm on both sides of Eq. (6-50), a linear relationship can

be established

log(D) = plog(R) + q (6-51)









where p = b and q = log(a). The plotting in Figure 6-6 shows that the linear relationship

holds for almost all frames in various video sequences.

To evaluate the accuracy of different models, we use the estimation errors in root of

mean squared error RMSE and R2. The R2 is a quantity used to measure the degree of

data variation from a given model. It is defined as

R2 i >(Xi X)2
= (6-52)
,(xi X)2

where Xi and /X are the actual and estimated values of data point i, respectively, and X

is the mean of all data points. The results of RMSE and R2 are listed in Table 6-1. It's

obvious that b < 0 since large rate will reduce the distortion.

Table 6-1. Model accuracy for R D
sequence RMSE R2
bus 0.0625 0.9976
foreman 0.0574 0.9967
mobile 0.0354 0.9994
paris 0.0506 0.9983


In this two-frame case, we can assume that a and b are the same for both frames.

Using Cauchy rate distortion function, Eq. (6-35) is expended as


minnD= pTD +(1-pT)a(TlRt)b+pT+(pT D +(1-p T)a(lRt)b) + (1 T+)a((1-) Rt)b

(6-53)

Set OD/r8l = 0, the optimal solution is


1*= (6-54)
1 + b1/(1-,)(i+,+1)
1+1


6.3.2.2 R-Q model

When the bit allocation are determined, a model maps rate to QP. Since budget bits

consist of texture bits for residual and head bits for motion and overheads, the R-Q model
















mobile


17


165


16


155

15


145


14


135

13


12 5
-


165

16


155

15


145 '

14


135

13


125

12 4 5 6 7 8
2 3 4 5 6 7


17 -



16



15 -



14-



13



12
10 11 12 5




17-


165


16


155

15


145


14-

135


S13 -


9 10 11 12 6


log(R)



Figure 6-6. The relation between R and D


log(R)


bus


only describes the relationship between bits for residual and QP. The first thing to do is to



estimate the number of bits for residual.



A simple method is to use the head bits from previous frame as a predictor. Then the



target rate for residual is the remainder of the total budget. The QP will be calculated by


MAD MAD
R= x1 +X2
Q Q2


(6-55)


where xi and x2 are model parameters, MAD is the mean absolute different between



original signal and its prediction and Q is the quantization stepsize. MAD is predicted by


10 11 12 13


10 11 12









MADs from previous frames by


MADn = al MADn-1 + a2 (6-56)


where MADn-i is the real MAD of (n 1)th frame and ar and a2 are model parameters.

When the current framed is encoded by the QP calculated using Eq. (6-55). All model

parameters in Eq. (6-55) and (6-56) will be estimated and stored for further usage.

6.3.2.3 Implementation details

In the real codec, the real p' is not available before the second frame is processed.

The p' is estimated by a linear model


p/T = 1Pi-1 + p 7 2 (6-57)

where pn- is the real p of (n 1)th frame and /i and /32 are model parameters, which

are updated after each frame is processed.

In the development of the algorithm, we make assumptions about the approximations

of distortion of SKIP and non-SKIP areas. These assumptions may not be valid when

there is a scene change in the video sequence. So we have to detect these sudden changes.

Usually, scene change will lead to the decreasing of p7. We can estimate p7 by assuming

zero motion in the nth frame. In other words, the simple frame difference is used as

prediction residual. Additional transforms and quantization are introduced to obtain

another estimate Then p and p7 are compared like [97]. If 71 exceeds a certain

threshold, it implies that nth frame is more different to (n 1)th frame as expected. In

this case, it is reasonable to spend more bits for nth frame.

The new sparsity based rate control is implemented in JM15.1 [75]. The whole

algorithm is described as follows.

1. Initial bit allocation for a frame.
Allocate a certain bit budget to the current frame Rt according to the approach in
JVT-G012 [98].

2. Calculate target rate.











Using the method to find the optimal solution r*. If all conditions are met, modify
the target rate of current frame to R- = Mr* Rt, where M is the number of
frames considered in the bit allocation algorithm. Otherwise, leave the target
rate unchanged.

3. Map to QP
According to the budget of current frame, determine the QP for encoding using R-Q
model in Eq. (6-55).

4. Model update.
Update the model parameters used in the algorithm based on the data in current
frame and previous frames. The model parameters are estimated by the least square
approximation.

6.3.2.4 Experimental results

Experiment Settings.

To evaluate the performance of proposed algorithm, we use a real video codec.

The state-of-the-art video coding standard H.264 is used. The encoding flowchart is

demonstrated in Figure 6-7. The proposed algorithm is part of the coderr control" module

on the top of the diagram.

Coder
Control


Input
Video Transform Quantization

De-Quantization

Inv. Transform
Entropy Bit
a) Coding Stream

Deblocking
Intra-frame Filter
Prediction


Motion Output
Compensation Video


Figure 6-7. Encoding process with rate control in H.264.









In our experiments, all frames are encoded as P-frames except for the first I-frame.

There are 150 frames for evaluation. Several common test video sequences in CIF type

are tested. All algorithm are tested with four target rates (i.e. 64, 128, 256, 384 kbps).

The rate control algorithm are operating at frame level. In this experiment, we compare

proposed bit allocation algorithm to JVT-G012 [98] algorithm for rate control. The rate

control operates at frame level. Both algorithms use the same R-Q model. The only

difference is that proposed algorithm uses adaptive bit allocation instead of average

distribution.

Based on the gamma rate theory, to evaluate a rate control algorithm, we need to

use the triplet (R, D, 7D) to quantify the performance. Since proposed algorithm in this

section and the rate control algorithm in H.264 can not control the distortion fluctuation,

we compare the R-D performance of proposed algorithm and the rate control algorithm

in H.264. At the same time, we list the gains of distortion fluctuations between two

algorithms. If proposed algorithm improves the R-D performance over H.264, that means

it needs smaller rate to achieve the same PSNR as H.264 does. According to the gamma

rate theory, larger distortion fluctuations are expected.

In Figure 6-8, we present rate distortion curves of proposed algorithm. We compared

proposed algorithm based on Gaussian, Laplacian and Cauchy models with H.264. Results

show that proposed algorithm improve the average PSNR over H.264 for all tested video

sequences. Detailed experimental results are available in following sections. The average

rate difference between proposed algorithm and JVT-G012 in H.264 is calculated as [76].

The average difference is the percentage of rate change w.r.t to the rates in JVT-G012 for

given PSNR. We also list the average difference of 7D between proposed algorithm and

JVT-G012 in H.264.

Gaussian Model.

In this experiment, we use Gaussian rate distortion function to obtain the optimal bit

allocation. The experimental results are listed in Table 6-2. There are 4.98% rate saving
















hall monitor


40


39


38


S37


S36-


35 -


34 -


33
50





42

41

40

39

38

K 37

S36

35

34

33

32
50


35

34

33

32

31

S30

29

28

27

26
50





38

37

36

35

34

K 33

a 32

31

30

29

28
50


H 264
Proposed (Laplacian)
S Proposed (Gaussian)
-- Proposed (Cauchy)
100 150 200 250 300 350 400
bitrate (kbps)



waterfall


H 264
Proposed (Laplacian)
Proposed (Gaussian)
Proposed (Cauchy)
100 150 200 250 300 350 400
bitrate (kbps)



news












/



SH 264
Proposed (Laplacian)
Proposed (Gaussian)
Proposed (Cauchy)
100 150 200 250 300 350 400
bitrate (kbps)


H 264
Proposed (Laplacian)
SProposed (Gaussian)
-- Proposed (Cauchy)
150 200 250 300 350 400
bitrate (kbps)


Figure 6-8.


Rate distortion comparison between H.264

different models.


and proposed algorithm with


on average over H.264. We can observe that for all tested sequences the rates decrease and



distortion fluctuations increase. It means proposed algorithm saves more bits in the cost of



introducing more distortion fluctuations. So there is a tradeoff between rate and distortion



fluctuation.



Laplacian Model.



Experimental results using Laplacian model are listed in Table 6-3. There are on



average 5.03% rate saving compared to H.264. For sequence 'hall monitor' the average



rate saving is 12%. While achieving rate saving, the proposed algorithm suffers large



distortion fluctuations. In Table 6-3, all the 7D differences are greater than 0, which means


100


parins









Table 6-2. Performance comparison of H.264 and proposed algorithm using Gaussian
model
Sequence R(kbps) JVT-G012 Proposed average 7D difference
rate PSNR rate PSNR rate difference (dB)
hall_monitor 64 64.27 34.45 64.62 34.73 -11.9' 0.48
128 130.37 36.47 129.41 36.79
256 256.69 38.11 257.3 38.54
384 384.48 39.11 384.45 39.29
waterfall 64 67.01 28.34 66.91 28.26 -1.; 0.16
128 129.15 31.87 129.77 31.56
256 256.47 34.75 257.33 35.46
384 383.32 36.27 385.8 36.96
bus 64 82.61 22.84 82.23 22.82 -0.7' 0.15
128 131.79 25.1 136.73 25.21
256 256.04 28.04 257.45 28.2
384 384.04 29.82 386.37 29.99
news 64 64.33 32.72 66.76 32.59 -4.1 0.07
128 128.12 35.54 128.89 36.05
256 256.19 39.54 256.99 39.64
384 384.48 41.55 383.98 41.58
paris 64 66.36 26.57 67.7 26.42 -6 0.29
128 128.16 28.91 129.49 29.19
256 256.33 31.7 258.48 32.25
384 383.84 33.55 385.43 34.52


proposed algorithm introduces more distortion fluctuations. Proposed algorithm trades

large distortion fluctuation for small rate when achieving the same PSNR as H.264.

Cauchy Model.

In Table 6-4, results from proposed algorithm based on Cauchy model achieve

5.1% rate saving over H.264 on average. The performance is better than Laplacian and

Gaussian model. So Cauchy model is a useful tool to describe the rate-distortion function.

In Table 6-4, we also observe that proposed algorithm trades large distortion fluctuations

for small rates.

Experimental results of sparsity based rate control algorithm without constraints

on distortion fluctuations show that there is a tradeoff between rate and distortion

fluctuations, which validates the gamma rate theory.









Table 6-3. Performance comparison of H.264 and proposed algorithm using Laplacian
model
Sequence R(kbps) JVT-G012 Proposed average 7D difference
rate PSNR rate PSNR rate difference (dB)
hall_monitor 64 64.27 34.45 64.62 34.73 -1 0.48
128 130.37 36.47 129.41 36.79
256 256.69 38.11 257.3 38.54
384 384.48 39.11 ;;,. :2 39.3
waterfall 64 67.01 28.34 66.91 28.26 -1. !' 0.16
128 129.15 31.87 129.61 31.57
256 256.47 34.75 256.97 35.46
384 383.32 36.27 386.13 37.07
bus 64 82.61 22.84 82.61 22.84 -0 0.23
128 131.79 25.1 135.43 25.19
256 256.04 28.04 257.68 28.18
384 384.04 29.82 388.84 30.02
news 64 64.33 32.72 66.76 32.59 -4.1 0.07
128 128.12 35.54 128.89 36.05
256 256.19 39.54 256.99 39.64
384 384.48 41.55 383.98 41.58
paris 64 66.36 26.57 67.7 26.42 -6 0.29
128 128.16 28.91 129.49 29.19
256 256.33 31.7 258.48 32.25
384 383.84 33.55 385.43 34.52


6.3.3 Rate Control with Constraints on Distortion Fluctuation

6.3.3.1 Algorithm framework

To provide viewer friendly video, we formulate the problem by introducing constraints

on the fluctuation of distortions. Suppose the scenario is with the same setting with

previous section. Then the problem is to find the solution for


min Dn(rRt) + Dn+l((1
'7


(6-58)


where 7D is a given threshold for viewer quality. The problem is solved by Lagrangian

multiplier, so we have


min D,(TlRt) + Dn+1((1 tr)Rt) + A(IDn D+1 7D)
'l,A


Tr)Rt) s.t. ID, Dn+il < 7D


(6-59)









Table 6-4. Performance comparison of H.264 and proposed algorithm using C !v model
Sequence R(kbps) JVT-G012 Proposed average 7D difference
rate PSNR rate PSNR rate difference (dB)
hall_monitor 64 64.27 34.45 64.62 34.73 -1 0.48
128 130.37 36.47 129.41 36.79
256 256.69 38.11 257.3 38.54
384 384.48 39.11 ;,. : 39.3
waterfall 64 67.01 28.34 66.91 28.26 -1.7 0.13
128 129.15 31.87 129.61 31.57
256 256.47 34.75 257.56 35.51
384 383.32 36.27 385.97 37.01
bus 64 82.61 22.84 82.61 22.84 -0 0.36
128 131.79 25.1 135.43 25.19
256 256.04 28.04 259.29 28.2
384 384.04 29.82 ::. '. 30.08
news 64 64.33 32.72 66.76 32.59 -4.1 0.07
128 128.12 35.54 128.89 36.05
256 256.19 39.54 256.99 39.64
384 384.48 41.55 383.98 41.58
paris 64 66.36 26.57 67.7 26.42 -6 0.29
128 128.16 28.91 129.49 29.19
256 256.33 31.7 258.48 32.25
384 383.84 33.55 385.43 34.52


where A is a Lagrangian multiplier and also a function of 70.

Taking the SKIP region into account, we can write the distortion as


Dn = pD+(1- p ) D
With the assumption that D D, Eq. (659) can e written as
With the assumption that DS n+1 D, Eq. (6-59) can be written as


(6-60)


min F = pnTDs + (1
'q' n


+A(pD + (1


p )D+ p D+ (
PT)DN n( D + (1


p )D Pn (P D, +- (1


pT)DN)+ (1


Pni )D +1


(6-61)


The solution can be found when


(6-62)

(6-63)


p)DN) 1 T N1)
Pn )ln n+-)Dn+









We will use different rate-distortion model to provide closed form solution for the

problem.

Gaussian model.

If the source has Gaussian distribution, we can expand Eq. (6-61) as

min F = pTD (1 p/)(1 + p 1),-2 2 -2p7Rt Pn T + p (1S p( n T \1)2 2-2(1-q )Rt
A n n + p, ) n+ n n n n n+ n
17,A
+A(pT(1 pT+)Dn + (1 pT)(1 pnT 1) 2)-2i-Rt _1 Pn+ +1)2 -2(1-)Rt) (6-64)

The solution of Eq. (6-64) can be found by setting F- = 0 and = 0.

Because 7D is just a bound, the optimal solution satisfies < rl*. To find r, we have

to reach the one with minimal cost. The search range is [0.5, r]. F/1r 0 = 0 yields

S(1 pT ),2 2-2(1-,qR 1 nT)(1 T 1).2 2-2iR
(A n n1 n (6-65)
(1- Pn 1)(2 2-2(1-q)R + (1 P)2 2-2?R)

We can get As corresponding to rI E [0.5, Tr*] using Eq. (6-65).

Once we find the optimal A*, the optimal r is

1 1 (1 p + pT 1) A*(1 P1 )] n
S= + [log2( n +)]) 21 og2( (6-66)
2 4Rt (1 pn)(1 A*) n

However, an observation is that under this model, AD increases along rl. But our goal

is to minimize total distortion. We knew the unconstraint solution.

1 1 (1- pT)(1 + pT 2
= + [log2( T ) og2( n )] (6-67)
2 4Rt 1 Pn+1 n+1

The constraints on distortion fluctuation actually introduce a constraint on the region of

rl. The corresponding bound is

1 C + -'C2 + 4(1 -p12 -2Rt
log2n 2nn (6 68)
2Rt 2(1 pn)n72

1-Pn+
1 1









So the final optimal solution is


-= min(T/*, T1) (6-69)


Constraint in dB.

If the constraint is given in dB, we have

Dn
10 loglo = PSNR2 PSNR1 < 7D (6-70)
Dn+l

Let r = 10D/10, we have

pn DS + (1 -p)D
Pn \ n nr (6-71)
-P D p + (1 p 1 pT ()D6 -)
TI(/Tn Dn + 1 -

The corresponding bound of region of r]

1 C + vC2 + 4(1 pT)(1 rp 1)r(1 p- 1)2 22-2Rt
71 = log2 n-n-- 21T J n + (6-72)
2Rt 2(1 p)(1 rp

where C = p DS(1 rp' ).

So the final optimal solution is


= min(%*, <+) (6-73)


Laplacian model.

Since the distortion of Laplacian model is defined as MAD, it is not straightforward

to derive the solution given the constraints on fluctuation of distortions.

Cauchy model.

With Cm. 1r!:v approximation model, the constraint on distortion fluctuation is

Ip, D, + (1 pT)DN p (p Dn + (1 p)D ) + (1- p,)D < 7D (6-74)
n n n Pn+n n n nn+lDn+l

The boundary point T1+ corresponding to the constraint will be


(1 p b +b (6-75)
arrt









where C= -_ D-p, Dn(1-p 1)
1-PTT+1
I+ can be solved by numerical methods. In the real codec, we can have a look-up

table to retrieve ry+. The final solution is obtained by n = min(q%*, l+).

Constraint in dB.

If the constraint 7D is given in dB, let r = 107D/10. Then we have

pD + (1 p )a(qRt)b < r (6-76)
Pn1(pn TD + (1 p)a(qRt)b) + (1 1)a((1 ql)Rt)b -

I+ can be calculated using numerical methods for


(1 pT)(1 rpT) + r(1 )(1 +)b = C (6-77)

where C = pD(i-rpbT+ The final solution is n = min(?q*, TI+).

6.3.3.2 Implementation details

The algorithm with constraints on distortion fluctuation is implemented in JM15.1

[75]. The implementation is similar to algorithm without constraints on distortion

fluctuation. The only difference is that we add the fluctuation check after we encode

the current frame by estimated QP. This check will verify the constraints on distortion

fluctuation. If the check fails, it will re-encode the frame with modified QP until the check

is met. So this algorithm will introduce complexity due to re-encoding frames.

The whole algorithm is described as follows.

1. Initial bit allocation for a frame.
Allocate a certain bit budget to the current frame Rt according to the approach in
JVT-G012 [98].

2. Calculate target rate.
Using the method to find the optimal solution q. If all conditions are met, modify
the target rate of current frame to R' = MRRt, where M is the number of
frames considered in the bit allocation algorithm. Otherwise, leave the target
rate unchanged.

3. Map to QP
According to the budget of current frame, determine the QP for encoding using R-Q
model in Eq. (6-55).









4. Fluctuation check.
C'! 1: the difference of distortion between current frame and previous frames. If the
distortion fluctuation constraint is violated, increase or decrease current QP and
re-encode the current frame until the constraint is met.

5. Model update.
Update the model parameters used in the algorithm based on the data in current
frame and previous frames. The model parameters are estimated by the least square
approximation.

To make a fair comparison, we modified the rate control algorithm in H.264 by adding

the same fluctuation check step at the end of frame encoding. We refer this new algorithm

for H.264 as modified H.264 in the rest of this chapter.

6.3.3.3 Experimental results

Experiment Settings.

To evaluate the performance of proposed algorithm, we use a real video codec.

The state-of-the-art video coding standard H.264 is used. The encoding flowchart is

demonstrated in Figure 6-7. The proposed algorithm is part of the coderr control" module

on the top of the diagram. In our experiments, all frames are encoded as P-frames except

for the first I-frame. There are 150 frames for evaluation. Several common test video

sequences in CIF type are tested. All algorithm are tested with four target rates (i.e.

64, 128, 256, 384 kbps). The rate control algorithm are operating at frame level. In our

experiments, we choose the constraint changes from 0.4 dB to 1.4 dB with stepsize 0.1 dB.

Based on the gamma rate theory, to evaluate a rate control algorithm, we need to

use the triplet (R, D, 7D) to quantify the performance. In particular, we compare the R-D

performances of rate control algorithms with the same 7D.

With the same distortion fluctuation, i.e. 7D, proposed algorithm achieves better R-D

performance than modified H.264 for different target bit rates and different sequences.

Figure 6-9 shows the R-D performances of proposed algorithm and I.,,/ I.1,: 1, H.264 with

70 = 0.8dB for various test video sequences. It can be observed that proposed algorithm

outperforms i,.I./: ,.1 H.264 in R-D performance.

















40


39



38


of 37
z/

36


35
Proposed
Modified H 264
34
50 100 150 200 250 300 350 400
R (kbps)


A 'hall monitor'



37

36

35

34

S33

S32

31 -

30

29 / Proposed
Modified H 264
28
50 100 150 200 250 300 350 400
R (kbps)


B 'waterfall'



42

41 -

40

39

38 -

of 37

36

35

34 /

33 / Proposed
Modified H 264
32
50 100 150 200 250 300 350 400
R (kbps)


C 'news'



Figure 6-9. Rate distortion comparisons with 7D = 0.8dB.













To demonstrate the trade-off between rate and distortion fluctuation described


by the gamma rate theory, we plot the relationships between rate and 7D for the same


distortions. As shown in Figure 6-10 and 6-11, both algorithm can use small rate and large


distortion fluctuation or large rate and small distortion fluctuation to achieve the same


PSNR. In other words, to achieve the same distortion, a tradeoff will be made between


rate and distortion fluctuation. This is predicted by the gamma rate theory.



1


09-





07-


06


05


04'
220 225 230


235 240 245 250 255 260
R (kbps)

A 37.78 dB


04
320 330 340 350 360 370
R (kbps)

B 38.59 dB


Figure 6-10. R vs yD of 'hall monitor' by ,I n.'.:i; ., H.264.


09


08


5 07


06


05


380 390































210 220 230
R (kbps)

A 37.72 dB


240 250 260


04
290 300 310 320 330 340 350 360 370 380 390
R (kbps)

B 38.62 dB


Figure 6-11. R vs 7D of 'hall monitor' by proposed algorithm.



6.4 Summary


In this chapter, we started with adding constraints on distortion fluctuation in video


coding problems. The gamma rate theory was proposed to provide theoretical guides for


rate control algorithms. Later, a sparsity based rate control algorithm without and with


constraints on distortion fluctuations were proposed to improve R-D performance. In


the new evaluation system (R, D, 7D) introduced based on the gamma rate theory, the


proposed algorithm demonstrated that there is a tradeoff between rate and distortion


09


08


5 07


06


05


09


08


5 07


06


05









fluctuation in real rate control algorithms, which proved that the gamma rate theory is

still valid for practical rate control schemes.









CHAPTER 7
CONCLUSION

7.1 Summary

In this section, we summarize the research presented in this dissertation.

This dissertation explored algorithms in image and video processing based on sparsity.

Two different approaches were adopted: data independent and data dependent approaches.

The background and related work were introduced in C'! lpter 1.

C'! lpter 2 proposed a new transform called Ripplet transform type I by generalizing

Curvelet transform to represent curves more efficiently in images. We introduced support

c and degree d in addition to scale, location and direction parameters in the definition,

while the curvelet transform is just a special case of the ripplet transform type I with

c = 1 and d = 2. The flexibility enabled the capability of yielding more efficient

representations for images with singularities. Ripplet-I functions form a new tight frame

for the functional space. The ripplets have good localization in both spatial and frequency

domain. In particular, we developed forward and backward Ripplet transform type I for

both continuous and discrete cases. The highly directional ripplets have general scaling

with arbitrary degree and support, which can capture 2D singularities along different

curves in any directions. To evaluate the performance of proposed transform, Ripplet

transform type I was applied to image compression and image denoising. Experimental

results indicated that ripplet-I transform can provide more efficient representations of

images with singularities along smooth curves. Using a few coefficients, ripplet-I can

outperform DCT and wavelet transform in nonlinear approximation. It is promising to

combine ripplet-I transform and other transforms such as DCT to represent the entire

image, which contains object boundaries and textures. Ripplet-I transform is used to

represent the structure and texture, while DCT is used to compress smooth parts. The

sparse representation of ripplet-I transform also demonstrated potential success in image

denoising.









In C'! i pter 3 we introduced Ripplet transform type II based on generalized Radon

transform for resolving 2D singularities. The new transform converts 2D singularities

into 1D singularities through generalized Radon transform. Then 1D wavelet transform

is used to resolve the 1D singularities. Both forward and inverse ripplet-II transform

were developed for continuous and discrete cases. Ripplet-II transform with d = 2 can

achieve sparser representation for 2D images, compared to ridgelet. Some properties of the

new transform are explored. Ripplet-II transform also enjoys rotation invariant property,

which can be leveraged by applications such as texture classification and image retrieval.

Experiments in texture classification and image retrieval demonstrated that the ripplet-II

transform based scheme outperforms wavelet and ridgelet transform based approaches.

C'!i lpter 4 presented data dependent approaches. In particular, we proposed a

general framework that explores self similarities inside signals to achieve sparsity. The

framework unified three sparsity based denoising techniques and applied them for video

compression artifact removal problem. We compared the de-artifacting performance of

the algorithms considering three aspects: patch dimension, transform dimension, and

quantization parameter. During the comparison, simulations were performed to support

our analysis, which may provide guidelines for applying similar denoising algorithms to

video compression work in the future.

In C'! Ilpter 5, we proposed several techniques to improve the performance of video

coding through sparsity enhancement of prediction residuals. To capture the structure

of contents in a MB, heterogeneous block pattern was introduced to fit the content

adaptively. In addition to model based intra prediction, we proposed to directly search for

the best prediction in pre-encoded image regions. These techniques can improve the video

coding performance by improving the sparsity of prediction residual data. The algorithms

were implemented based on the hybrid video coding framework in H.264. Experimental

results demonstrated gains over H.264 by proposed algorithms.









In C'!I ilter 6, we explored the rate-distortion behaviors after introducing new

constraints on the smoothness of distortion. Uncertainty principle was proposed to

provide guide for rate control algorithm designing. There is a tradeoff between the

distortion fluctuation and the target rate. A sparsity based rate control algorithm was

proposed to improve the coding efficiency with limitation on bit rate. Rate-distortion

models: Gaussian, Laplacian and Cauchy models, were employ, ,1 to derive the optimal bit

allocation for cases with and without constraints on distortion smoothness. The proposed

algorithm was evaluated in the (R, D, 7D) fashion based on the gamma rate theory.

Experimental results demonstrated that the gamma rate theory is valid for practical rate

control schemes.

7.2 Future Work

In this section, we point out future research directions.

Sparsity Based Rate Control

In C'!i Ilter 6, we proposed a sparsity based rate control algorithm. The rate control

algorithm works for bit allocation among multiple frames. There are several directions to

extend the algorithm.

* Multiple-level
We will first extend the bit allocation algorithm to frame level, and MB level. For
frame level algorithms, we attend to find the optimal bit allocation among different
MBs. For MB level algorithm, we will distribute budget among different transform
coefficients in a rate-distortion optimization fashion. Then We will find the optimal
bit allocation for the whole video sequence by jointly considering multiple-frame,
frame and MB level.

Combine ps and p7
In the rate control algorithm, we will use p domain rate control [83]. To distinguish
with p7 used in C'! Ilpter 6, we use ps here. ps carries the sparsity information in
spatial domain and pT describes the sparsity information in temporal domain. We
can derive rate control algorithms by exploring sparsity in both spatial and temporal
domains. A joint scheme of ps and p' will be designed. By jointly tuning ps and p,
we can expect superior performance over existing approaches.









REFERENCES

[1] J. Tropp, Topics in sparse approximation, Ph.D. dissertation, University of Texas at
Austin, 2004.

[2] D. L. Donoho, M. Vetterli, R. A. Devore, and I. Daubechies, "Data compression and
harmonic analysis," IEEE Transactions on Inrf..' ,,it.l:.n The..,;, vol. 44, no. 6, pp.
2435-2476, October 1998.

[3] A. Oppenheim, A. Willsky, and I. Young, S...'- and si;-/. /,- Prentice Hall
Englewood Cliffs, NJ, 1983.

[4] H. Kramer and M. Mathews, "A linear coding for transmitting a set of correlated
signals," IRE Transactions on Information Th(..,;i vol. 2, no. 3, pp. 41-46, 1956.

[5] S. Watanabe, "Karhunen-Loeve expansion and factor analysis: theoretical remarks
and applications," Pattern Recognition: Introduction and Foundations, pp. 635-660,
1973.

[6] N. Ahmed, T. Natare, i and K. Rao, "Discrete cosine transform," IEEE Transac-
tions on Computers, vol. 100, no. 23, pp. 90-93, 1974.

[7] G. Wallace, "The JPEG still picture compression standard," IEEE Transactions on
Consumer Electronics, vol. 38, no. 1, 1992.

[8] C. Burrus, R. Gopinath, and H. Guo, Introduction to wavelets and wavelet trans-
forms: a primer, Prentice Hall, 1998.

[9] M. Vetterli and C. Herley, \\ -.v !. ts and filter banks: Theory and design," IEEE
Transactions on S,:'lI1 Processing, vol. 40, no. 9, pp. 2207-2232, September 1992.

[10] M. Vetterli and J. Kovacevic, Wavelet and subband coding, Prentice-Hall, Englewood
Cliffs, NJ, 1995.

[11] I. Daubechies, Ten lectures on wavelets, SIAM, Philadelphia, PA, 1992.

[12] S. Mallat, A wavelet tour of ..:t,.l processing, Academic, New York, 2nd edition,
1999.

[13] C. ('!Ci-..', ulos, A. Skodras, and T. Ebrahimi, "The JPEG2000 still image coding
system: an overview," IEEE Transactions on Consumer Electronics, vol. 46, no. 4,
pp. 1103-1127, 2000.

[14] R. Bracewell, "The fourier transform," Scientific American, vol. 260, no. 6, pp. 86-95,
1989.

[15] E. Brigham, The fast Fourier 'I,,rf.,rm and its applications, Prentice Hall Englewood
Cliffs, NJ, 1988.









[16] J. Foster and F. Richards, "The Gibbs phenomenon for piecewise-linear
approximation," The American Mathematical Mon/hll, vol. 98, no. 1, pp. 47-49,
1991.

[17] A. Jerri, The Gibbs phenomenon in Fourier i,.il.; splines, and wavelet approxima-
tions, Springer, 1998.

[18] I. Daubechies, "The wavelet transform, time-frequency localization and signal
analysis," IEEE Transactions on Information Ti,... vol. 36, no. 5, pp. 961-1005,
1990.

[19] E. J. Candes and D. L. Donoho, "Ridgelets: a key to higher dimensional
intermittency?," Philosophical Transactions: Mathematical, P,;, .:. il and Engineering
Sciences, pp. 2459-2509, 1999.

[20] E. J. Candes, Ridgelet: Theory and Application, Ph.D. dissertation, Stanford
University, 1998.

[21] E. J. Candes and D. L. Donoho, \ .-- tight frames of curvelets and optimal
representations of objects with piecewise c2 singularities," Communications on
Pure and Applied Mathematics, vol. 57, no. 2, pp. 219-266, February 2003.

[22] E. J. Candes, L. Demanet, D. L. Donoho, and L. Ying, I: 1- discrete curvelet
transforms," Multiscale Modeling and Simulation, vol. 5, pp. 861-899, 2005.

[23] G. Easley, D. Labate, and W. Lim, "Sparse directional image representations using
the discrete shearlet transform," Applied and Computational Harmonic A,.,l;-,.: vol.
25, no. 1, pp. 25-46, 2008.

[24] D. Labate, W. Lim, G. Kutyniok, and G. Weiss, "Sparse multidimensional
representation using shearlets," in SPIE Proceedings 5914, SPIE, B, ll.:n.,l,,,
WA, 2005, pp. 254-262.

[25] B. Atal and S. Hanauer, "Speech analysis and synthesis by linear prediction of
the speech wave," Journal of the Acoustical S'... Ii of America, vol. 50, no. 2, pp.
637-655, 1971.

[26] H. Strube, "Linear prediction on a warped frequency scale," The Journal of the
Acoustical S'. .. I/; of America, vol. 68, pp. 1071, 1980.

[27] A. Harma and U. Laine, "A comparison of warped and conventional linear predictive
coding," IEEE Transactions on Speech and Audio Processing, vol. 9, no. 5, pp.
579-588, 2001.

[28] J. Gibson, Digital compression for multimedia: principles and standards, Morgan
Kaufmann, 1998.

[29] K. Pohlmann and K. Pohlman, Principles of digital audio, McGraw-Hill, 2005.









[30] G. Box, G. Jenkins, and G. Reinsel, Time series n,.,l;.i: forecasting and control,
Holden-d i San Francisco, 1976.

[31] 0. Guleryuz, \\. i,!!ii. d overcomplete d, ,. i-in: in Conference Record of the
Tli,,it;i-Seventh Asilomar Conference on S...Irl S,1i. i- and Computers, 2003, vol. 2.

[32] Y. N ,!: .a and H. Harashima, "Motion compensation based on spatial
transformations," IEEE Transactions on Circuits and S1,/. ii' for Video T I,,1.. I..'~
vol. 4, no. 3, pp. 339-356, 1994.

[33] T. Wedi, "Motion compensation in H. 264/AVC," IEEE Transactions on Circuits and
Si-1. m- for Video T. ,.,l..,- vol. 13, no. 7, pp. 577-586, 2003.

[34] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, liin i', Denoising by Sparse 3-D
Transform-Domain Collaborative Filtering," IEEE Transactions on Image Processing,
vol. 16, no. 8, pp. 2080, 2007.

[35] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, "Video Denoising by Sparse 3-D
Transform-Domain Collaborative Filtering," in Proceedings of 15th European S.:I',,i
Processing Conference, Sep. 2007.

[36] X. Li and Y. Z!, ih- "Patch-based video processing: a variational B '.v in
approach," IEEE Transactions on Circuits and Sr -ii'/ for Video T In,,..i,;
vol. 19, no. 1, pp. 27-40, 2009.

[37] M. Do and M. Vetterli, "The finite ridgelet transform for image representation,"
IEEE Transactions on Image Processing, vol. 12, no. 1, pp. 16-28, January 2003.

[38] S. R. Deans, The Radon transform and some of its applications, John Wiley & Sons,
New York, 1983.

[39] J. L. Starck, E. J. Candes, and D. L. Donoho, "The curvelet transform for image
d,, ir.-ii.; IEEE Transactions on Image Processing, vol. 11, pp. 670-684, June 2002.

[40] D. L. Donoho and M. R. Duncan, "Digital curvelet transform: strategy,
implementation and experiments," in Proceedings of Aerosense 2000, Wavelet
Applications VII. SPIE, 2000, vol. 4056, pp. 12-29.

[41] E. J. Candes and D. L. Donoho, "Continuous curvelet transform: I. Resolution of the
wavefront set," Applied and Computational Harmonic A,,'.';t,; vol. 19, no. 2, pp.
162-197, 2005.

[42] E. J. Candes and D. L. Donoho, "Continuous curvelet transform: II. Discretization
and frames," Applied and Computational Harmonic A,.al;- .- vol. 19, pp. 198-222,
2005.

[43] L. Hormander, The /i.,.:-r of linear partial differential operators, Springer-V i1 -
Berlin, 2003.









[44] M. N. Do and M. Vetterli, "The contourlet transform: an efficient directional
multiresolution image representation," IEEE Transactions on Image Processing, vol.
14, no. 12, pp. 2091-2106, December 2005.

[45] M. N. Do and M. Vetterli, "Contourlets," in Beyond wavelets, G. V. Welland, Ed.
Academic Press, New York, 2003.

[46] E. Le Pennec and S. Mallat, "Sparse geometric image representations with
bandelets," IEEE Transactions on Image Processing, vol. 14, no. 4, pp. 423-438,
2005.

[47] A. Cohle, I. Daubechies, and J. Feauveau, "Biothogonal base of compactly supported
wavelets," Communications on Pure and Applied Mathematics, vol. 4, pp. 45-47,
1992.

[48] D. Taubman, "High performance scalable image compression with EBCOT," IEEE
Transactions on Image Processing, vol. 9, no. 7, pp. 1158-1170, 2000.

[49] D. Taubman, E. Ordentlich, M. Weinberger, G. Seroussi, I. Ueno, and F. Ono,
"Embedded block coding in JPEG2000," in Proceedings of IEEE International
Conference on Image Processing, 2000, vol. 2.

[50] "Textures," http://sipi.usc.edu/database/database.cgi?volume textures.

[51] J. Xu and D. Wu, "Ripplet transform for feature extraction," in Proceedings of SPIE
Defense S.. ;, i. Symposium, March 2008, vol. 6970, pp. 69700X-69700X-10.

[52] J. Xu and D. Wu, "Ripplet-II transform for feature extraction," in Proceedings of
SPIE Visual Communications and Image Processing, July 2010.

[53] A. Cormack, "The Radon transform on a family of curves in the plane (I)," Proceed-
ings of the American Mathematical ... I, vol. 83, no. 2, pp. 325-330, 1981.

[54] A. Cormack, "The Radon transform on a family of curves in the plane (II),"
Proceedings of the American Mathematical S.. .. I/; vol. 86, no. 2, pp. 293-298, 1982.

[55] F. NATTERER, The mathematics of computerized i,. ,,,.,i','li';, SIAM, 2001.

[56] K. Denecker, J. Van Overloop, and F. Sommen, "The general quadratic Radon
transform," Inverse Problems, vol. 14, no. 3, pp. 615-634, 1998.

[57] "Usc-sipi image database," http://sipi.usc.edu/database.

[58] "Rotated textures," http://sipi.usc.edu/database/database.cgi?volume rotate.

[59] C. Pun and M. Lee, "Log-polar wavelet energy signatures for rotation and scale
invariant texture classification," IEEE Transactions on Pattern A,.al -.:- and Machine
Intelligence, vol. 25, no. 5, pp. 590-603, 2003.

[60] I. Jolliffe, Principal component i,..l;-.,' Springer V i1 .- 2002.









[61] P. List, A. Joch, J. Lainema, G. Bjntegaard, and M. Karczewicz, "Adaptive
deblocking filter," IEEE Transactions on Circuits and S, I iim- for Video Tech-
r. ,l..-,/i vol. 13, no. 7, pp. 614-619, 2003.

[62] C. Dorea, O. Escoda, P. Yin, and C. Gomila, "A Direction-adaptive In-loop
Deartifacting Filter for Video Coding," in Proceedings of IEEE International
Conference on Image Processing, 2008, pp. 1624-1627.

[63] 0. Guleryuz, "A nonlinear loop filter for quantization noise removal in hybrid video
compression," in Proceedings of IEEE International Conference on Image Processing,
2005.

[64] R. Gonzalez and R. Woods, "Digital image processing," Addison W.:-;, 1992.

[65] R. Brown and P. Hwang, Introduction to random -,..:l- and applied Kalman filtering,
John Wiley & Sons, 1992.

[66] ITU-T, "Video Codec for Audiovisual Service at Px64 Kbits/s," ITU-T recommenda-
tion H.261 version 1, 1990.

[67] I. J. 1, "Coding of Moving Pictures and Associated Audio-for Digital Storage Media
at up to about 1.5 Mbit/s Part 2: Video," ISO/IEC 11172(_/PEG 1), November
1993.

[68] ITU-T and I. J. 1, "Generic coding of moving pictures and associated audio
information Part 2: Video," ITU-T Rec. H.262 and ISO/IEC 13818-2 (1_PEG-
2), November 1994.

[69] ITU-T, "Video coding for low bit rate communication," ITU-T recommendation
H.263 version 1, 1995.

[70] I. J. 1, "Coding of audio-visual objects Part 2: Visual," ISO/IEC 14496-2(1 PEG-4
Part 2), January 1999.

[71] I. Recommendation, "H. 264/AVC Video Coding Standard," 2003.

[72] G. Sullivan and T. Wiegnad, "Video compressionfrom concepts to the H. 264/AVC
standard," Proceedings of the IEEE, vol. 93, no. 1, pp. 18-31, 2005.

[73] T. Wiegand, G. Sullivan, G. Bjontegaard, and A. Luthra, "Overview of the H.
264/AVC video coding standard," IEEE Transactions on Circuits and S, i-1 /I, for
Video T. ./..' vol. 13, no. 7, pp. 560-576, 2003.

[74] T. Wiegand, X. Zli ilr and B. Girod, "Long-term memory motion-compensated
prediction," IEEE Transactions on Circuits and S, ii-. for Video T ,. .I.. .-/;, vol. 9,
no. 1, pp. 70-84, 1999.

[75] "H.264/avc reference software," http://iphome.hhi.de/suehring/tml/download/.









[76] G. Bjontegaard, "Calculation of average PSNR differences between RD-curves," Tech.
Rep. VCEG-M33, ITU-T Q.6/SG16, April 2001.

[77] C. Shannon, "A mathematical theory of communication," Bell System Technical
Journal, vol. 27, pp. 379-423, 1948.

[78] C. Shannon, "Coding theorems for a discrete source with a fidelity criterion,"
Information and decision processes, pp. 93-126, 1960.

[79] T. Berger, Rate distortion '#,..,i'; Prentice-Hall Englewood Cliffs, NJ, 1971.

[80] R. Gray, Source coding I/ ..' ,; Springer, 1990.

[81] T. Cover and J. Thomas, Elements of information I, .. i,; Wiley, 2006.

[82] Z. He and S. Mitra, "A linear source model and a unified rate control algorithm
for DCT video coding," IEEE Transactions on Circuits and S',/. n for Video
T. I,.,.l .,i/; vol. 12, no. 11, pp. 970-982, 2002.

[83] Z. He and S. Mitra, "Optimum bit allocation and accurate rate control for video
coding via/spl rho/-domain source modeling," IEEE Transactions on Circuits and
S,-l. mi- for Video T.. ,,..,/.i;/ vol. 12, no. 10, pp. 840-849, 2002.

[84] Y. Liu, Z. Li, and Y. Soh, "A novel rate control scheme for low delay video
communication of H. 264/AVC standard," IEEE Transactions on Circuits and
S'i/. "m for Video T,. h,. l.,i,;/ vol. 17, no. 1, pp. 68-78, 2007.

[85] T. Chiang and Y. Zi wi- "A new rate control scheme using quadratic rate distortion
model," IEEE Transactions on Circuits and S,1, ,ii for Video T. I,,'... ',i/; vol. 7, no.
1, pp. 246-250, 1997.

[86] Z. Li, W. Gao, F. Pan, et al., "Adaptive Rate Control for H. 264," Journal of Visual
Communication and Image Representation, vol. 17, no. 2, pp. 376-406, 2006.

[87] M. Jiang and N. Ling, "On Lagrange multiplier and quantizer adjustment for H. 264
frame-IV r video rate control," IEEE Transactions on Circuits and S, ii'- for Video
T. I,.,.. ..,i/; vol. 16, no. 5, pp. 663-669, 2006.

[88] S. Ma, W. Gao, and Y. Lu, "Rate-distortion analysis for H. 264/AVC video coding
and its application to rate control," IEEE Transactions on Circuits and S, /'ii', for
Video T. ,. ..-,/;, vol. 15, no. 12, pp. 1533-1544, 2005.

[89] H. Yu, Z. Lin, and F. Pan, "An improved rate control algorithm for H. 264," in IEEE
International Symposium on Circuits and S1,1l1 m- 2005, pp. 312-315.

[90] P. Yin and J. Bc.-,' "A new rate control scheme for H. 264 video coding," in
Proceedings of IEEE International Conference on Image Processing, 2004, vol. 1.

[91] J. Xu and Y. He, "A novel rate control for H. 264," in Proceedings of IEEE
International Symposium on Circuits and S,1l1 ,- 2004.









[92] S. Kim and Y. Ho, "Rate control algorithm for H. 264/AVC video coding standard
based on rate-quantization model," in Proceedings of IEEE International Conference
on Multimedia and Expo, 2004, vol. 1.

[93] E. Lam and J. Goodman, "A mathematical analysis of the DCT coefficient
distributions for images," IEEE Transactions on Image Processing, vol. 9, no. 10,
pp. 1661-1666, 2000.

[94] Y. Altunbasak and N. Kamaci, "An analysis of the DCT coefficient distribution
with the H. 264 video coder," in Proceedings of IEEE International Conference on
Acoustics, Speech, and S.:,.,l1 Processing, 2004, vol. 3.

[95] A. Viterbi and J. Omura, Principles of digital communication and coding,
McGraw-Hill New York, 1979.

[96] D. Kwon, M. Shen, and C. Kuo, "Rate control for H. 264 video with enhanced
rate and distortion models," IEEE Transactions on Circuits and S,-I /ii,- for Video
T -, I,,..1..,i;. vol. 17, no. 5, pp. 517-529, 2007.

[97] X. Li, A. Hutter, and A. Kaupa, "Efficient one-pass frame level rate control for
H.264/AVC," Journal of Visual Communication and Image Representation, vol. 20,
no. 8, pp. 585-594, November 2009.

[98] Z. Li, F. Pan, K. Lim, G. Feng, X. Lin, and S. Rahardi i "Adaptive basic unit l1iv
rate control for JVT," in JVT-GO12-rl, 7th Meeting, P ,lla.;, II, Thailand, 2003.









BIOGRAPHICAL SKETCH

Jun Xu was born in Tianmen, Hubei, ('Ci i He got his B.E. and M.S. degrees

at Huazhong University of Science and Technology, Wuhan, China, in 2003 and 2006.

He received the Ph.D. degree in electrical and computer engineering from University of

Florida, Gainesville, FL in August 2010. His research interests include image processing,

video coding and multimedia communication.





PAGE 2

2

PAGE 3

3

PAGE 4

Firstofall,Iamheartilythankfultomyadvisor,ProfessorDapengOliverWu,whoseencouragement,guidanceandsupportenlightenthedevelopmentofmyresearch.Hisenthusiasm,hisinspirationandhisgreateortstoexplainthingsclearlyandsimplyestablishearolemodelforme.Hehasmadeavailablehissupportinanumberofways:encouragement,soundadvice,goodteaching,andlotsofgoodideas.Thisthesiswouldnothavebeenpossiblewithouthisinsightfulguidanceandstricttrainingincreativethinking,logicalreasoningandwritingskills.IwouldalsoliketothankProfessorTaoLi,ProfessorYijunSunandProfessorScottBanksforservingonmydissertationcommitteeandprovidingvaluablefeedbacksonmyresearch.IamindebtedtomyMasteradvisorProfessorJingliZhouforbringingmetotheworldofimageandvideoprocessing.IamthankfultomyocemateinMultimediaCommunicationsandNetworkingLabatUF.Itisgreatfortuneformetojointhisfriendlyfamily.IwouldliketothankseniorlabmemberDr.JieyanFanforhishelpandadviceinmyearlydaysinUS.ThankstoZhifengChen,BingHan,TaoranLu,andWenxingYeforusefuldisscussionsinmanyresearchproblems.IwanttothankDr.XiaochenLi,XihuaDong,YiranLiandYunzhaoLifortheirsupportandfriendship.MyappreciationalsogoestoShanshanRen,ZiyiWangandLinZhangfortheirsupport,sharing,andgenerositytothisgroup.ThankstoDr.HuilinXu,WenshuZhang,andDongliangDuan.IwouldliketothankYanLiandLuChen,myfriendsforovertenyears,fortheirkindnessandsupporteversincewemet.Itwasrealgreattomeetandknowallthesefriendsandspendwonderfulfouryearsherewiththem.Iwillcherisheverymomentthatweweretogether.IalsowanttothankZongruiDing,LeiYang,QianChen,YuejiaHe,YakunHu,JiangpingWang,andZhengYuan. 4

PAGE 5

5

PAGE 6

page ACKNOWLEDGMENTS ................................. 4 LISTOFTABLES ..................................... 9 LISTOFFIGURES .................................... 10 ABSTRACT ........................................ 13 CHAPTER 1INTRODUCTION .................................. 15 1.1Motivation .................................... 15 1.2Outline ...................................... 19 2RIPPLET-ITRANSFORM ............................. 22 2.1Introduction ................................... 22 2.2ContinuousCurveletTransform ........................ 24 2.3ContinuousRipplet-ITransform ........................ 27 2.3.1Ripplets ................................. 27 2.3.2ContinuousRipplet-ITransform .................... 29 2.4DiscreteRipplet-ITransform .......................... 32 2.5TightFrame ................................... 33 2.6ExperimentalResults .............................. 35 2.6.1NonlinearApproximation ........................ 35 2.6.1.1Ripplet-Iwithdierentdegrees ............... 37 2.6.1.2Comparisonwithothertransforms ............. 37 2.6.2ImageCompression ........................... 39 2.6.2.1Comparisononcroppedimages ............... 40 2.6.2.2Comparisonontexture-tichimages ............. 40 2.6.2.3Comparisononnaturalimages ................ 40 2.6.3ImageDenoising ............................. 41 2.7Summary .................................... 42 3RIPPLET-IITRANSFORM ............................. 49 3.1Introduction ................................... 49 3.2GeneralizedRadonTransform ......................... 51 3.3Ripplet-IITransform .............................. 54 3.3.1ContinuousRipplet-IITransform .................... 54 3.3.1.1Forwardtransform ...................... 56 3.3.1.2Inversetransform ....................... 57 3.3.2ContinuousOrthogonalRipplet-IITransform ............. 58 3.3.3DiscreteRipplet-IITransform ..................... 59 6

PAGE 7

61 3.4PropertiesofRipplet-IITransform ...................... 62 3.5ExperimentalResults .............................. 66 3.5.1TextureClassication .......................... 66 3.5.2ImageRetrieval ............................. 70 3.6Summary .................................... 72 4SPARSITYBASEDDE-ARTIFACTINGINVIDEOCODING .......... 73 4.1Introduction ................................... 73 4.2Framework .................................... 76 4.3AlgorithmDescription ............................. 76 4.3.1SimilarPatchSetFormulation ..................... 78 4.3.2SparsityEnforcement .......................... 78 4.3.3MultipleHypothesisFusion ....................... 80 4.3.4ApplicationtoCompressionArtifactRemoval ............ 81 4.4ComplexityAnalysis .............................. 81 4.4.1SearchingComplexity .......................... 82 4.4.2TransformComplexity ......................... 83 4.5PerformanceComparison ............................ 83 4.5.12Dor3Dpatch ............................. 84 4.5.2TransformDimensionality ....................... 85 4.5.3QuantizationParameters ........................ 85 4.5.4VisualQuality .............................. 86 4.6Summary .................................... 86 5SPARSITYENHANCEDVIDEOCODINGTECHNIQUES ........... 90 5.1Introduction ................................... 90 5.2AdaptiveBlockPattern ............................ 93 5.2.1HeterogeneousBlockPattern ...................... 93 5.2.2ImplementationDetails ......................... 94 5.3EnhancedIntraPrediction ........................... 94 5.3.1IntraSimilar-blockSearch ....................... 94 5.3.2ImplementationDetails ......................... 97 5.4ExperimentalResults .............................. 97 5.4.1HeterogeneousBlockPattern ...................... 98 5.4.2EnhancedIntraPrediction ....................... 98 5.4.3CombinationofAlgorithms ....................... 98 5.5Summary .................................... 99 6SPARSITYBASEDRATECONTROLINVIDEOCODING .......... 105 6.1Introduction ................................... 105 6.2GammaRateTheory .............................. 108 6.2.1ReverseWater-lling .......................... 108 6.2.2GammaRateTheory .......................... 110 7

PAGE 8

........................... 118 6.3SparsityBasedRateControl .......................... 118 6.3.1RateControlinVideoCoding ..................... 118 6.3.2RateControlwithoutConstraintsonDistortionFluctuation .... 121 6.3.2.1Algorithmframework ..................... 121 6.3.2.2R-Qmodel .......................... 125 6.3.2.3Implementationdetails .................... 127 6.3.2.4Experimentalresults ..................... 128 6.3.3RateControlwithConstraintsonDistortionFluctuation ...... 132 6.3.3.1Algorithmframework ..................... 132 6.3.3.2Implementationdetails .................... 136 6.3.3.3Experimentalresults ..................... 137 6.4Summary .................................... 140 7CONCLUSION .................................... 142 7.1Summary .................................... 142 7.2FutureWork ................................... 144 REFERENCES ....................................... 145 BIOGRAPHICALSKETCH ................................ 152 8

PAGE 9

Table page 2-1PSNRcomparisonofRipplet-IandJPEG2000at0.03125bpp .......... 40 2-2AveragePSNRgainofripplet-I(c=1,d=3)basedcodec,comparedtoJPEGandJPEG2000,respectively ............................. 41 3-1Informationextractedfromdata ........................... 66 3-2Errorrateunderdierenttransformsusingfeatureextraction1 ......... 69 3-3Errorrateunderdierenttransformsusingfeatureextraction2 ......... 70 3-4Averageretrievalrateunderdierenttransformsusingfeatureextraction1 ... 71 3-5Averageretrievalrateunderdierenttransformsusingfeatureextraction2 ... 71 4-1Performance(PSNRgainindB)of2Dand3Dpatchesinvideoswithdierentmotioncharacteristics. ................................ 84 5-1SyntaxaddedtoMBheader. ............................. 94 5-2SyntaxaddedtoMBheader. ............................. 97 5-3Encodingconguration ................................ 98 6-1ModelaccuracyforRD 125 6-2PerformancecomparisonofH.264andproposedalgorithmusingGaussianmodel 131 6-3PerformancecomparisonofH.264andproposedalgorithmusingLaplacianmodel 132 6-4PerformancecomparisonofH.264andproposedalgorithmusingCauchymodel 133 9

PAGE 10

Figure page 2-1Thetilingofpolarfrequencydomain.Theshadowed`wedge'correspondstothefrequencytransformoftheelementfunction. .................... 25 2-2Ripplet-Ifunctionsinspatialdomainwithdierentdegreesandsupports,whicharealllocatedinthecenter,i.e.,b=0. ....................... 28 2-3Thecomparisonofcoecientsbetweenripplet-Itransformandwavelettransform. 36 2-4Testimages. ...................................... 38 2-5ComparingnonlinearapproximationperformanceofrippletswithxedsupportanddierentdegreescorrespondingtothetestimagesinFig. 2-4 ........ 39 2-6Performancecomparisons. .............................. 43 2-7Thevisualqualitycomparisonbetweenripplet-IbasedimagecodecandJPEG2000forapatchcroppedfrom`barbara',whenbppisequalto0.3. ........... 44 2-8Texture-richimagesusedinourexperiment .................... 45 2-9PSNRvs.bppforripplet-Ibasedimagecodec,JPEGandJPEG2000. ...... 46 2-10Thevisualqualitycomparisonbetweenripplet-IbasedimagecodecandJPEG2000for`mandrill',whenbppisequalto0.25. ...................... 47 2-11Ascale-updetailsofdenoisedtestimage`barbara'.Thestandardvarianceofnoiseis15. ....................................... 48 3-1CurvesdenedbyEq.( 3{3 )inCartesiancoordinates.(a)d=1.(b)d=2.(c)d=3.(d)d=1.(e)d=2.(f)d=3. .................... 52 3-2Ripplet-IIfunctionsinCartesiancoordinates(x1,x2)(a)a=1,b=0,d=2and=0.(b)a=2,b=0,d=2and=0.(c)a=1,b=0.05,d=2and=0.(d)a=1,b=0,d=2and=30. ..................... 55 3-3Ripplet-IIfunctionsinCartesiancoordinates(x1,x2)(a)a=1,b=0,d=1and=0.(b)a=2,b=0,d=1and=0.(c)a=1,b=0.05,d=1and=0.(d)a=1,b=0,d=1and=30. ..................... 56 3-4Gaussianimageswithacurveedge.Toprow:originalimagef(x,y);Middlerow:Magnitudeof2DFouriertransform;Bottomrow:Magnitudeof2DFouriertransformaftersubstitutingthepolarcoordinate(r0,0)with(p ................................. 60 3-5Ripplet-IIwithdierentdegrees. .......................... 63 10

PAGE 11

................................. 65 3-7Texturesusedintextureclassication. ....................... 67 3-8Texturesrotatedwithdierentangles. ....................... 68 4-1Codingartifacts:(A)Edgedistortion.(B)Ringingeects.(C)Texturedistortion.(D)Blockyartifacts. ................................. 74 4-22Dand3Dpatchesinavideosequence. ...................... 77 4-3Flowchartoftheproposedframework.2Dpatchesareusedfordemonstration. 77 4-4Patchsortingandpacking.Thedegreeofgrayinpatchesindicatesthesimilarity.Thelowerthedegreeis,themoresimilarthepathistothereferencepatch. .. 79 4-5Deartifactinglterasapost-processingtoolinavideoencoder. ......... 82 4-6Deartifactinglterasanin-looptoolinavideoencoder. ............. 82 4-7Performanceofdierenttransformdimensions. ................... 87 4-8Performancefordierentquantizationparameters. ................ 88 4-9Visualcomparisonofdetailedcrops.LeftcolumniscodedbyH.264.Rightcolumnislteredbyproposedlter. ............................. 89 5-1Hybridvideocodingdiagram. ............................ 91 5-2PartitionofMB. ................................... 92 5-3Intrapredictiondirectionsfor4x4blocks. ...................... 93 5-4Blockpatterncomparison.(A)HomogeneousblockpatterninH.264.(B)Proposedheterogeneousblockpattern. ............................. 95 5-5ExampleofsimilarMBs.TwoMBsindicatedbyblockrectanglesareverysimilartoeachother. ..................................... 96 5-6RDplotsofH.264andproposedalgorithmwithoutoverhead. .......... 99 5-7RDplotsofH.264andproposedalgorithmwithoverhead. ............ 100 5-8RDplotsofH.264andproposedalgorithmwithoutoverhead. .......... 101 5-9RDplotsofH.264andproposedalgorithmwithoverhead. ............ 102 5-10RDplotsofH.264andproposedalgorithmwithoutoverhead. .......... 103 5-11RDplotsofH.264andproposedalgorithmwithoverhead. ............ 104 11

PAGE 12

......................... 105 6-2Rate-distortionfunctionforaGaussiansource.Achievablerate-distortionregionisthegrayarea. .................................... 106 6-3Reversewater-llingfor7Gaussiansources .................... 109 6-4ControllableregioninR{Dplane. ......................... 111 6-5\(R)function. .................................... 119 6-6TherelationbetweenRandD 126 6-7EncodingprocesswithratecontrolinH.264. .................... 128 6-8RatedistortioncomparisonbetweenH.264andproposedalgorithmwithdierentmodels. ........................................ 130 6-9RatedistortioncomparisonswithD=0.8dB. ................... 138 6-10RvsDof`hallmonitor'bymodiedH.264. .................... 139 6-11RvsDof`hallmonitor'byproposedalgorithm. ................. 140 12

PAGE 13

Thewideusageofdigitalmultimediaconsumerelectronicsleadstotherapidexplosionoftheamountofimageandvideodataforsharing,storageandtransmissionovernetworks.Howtondecientalgorithmstoprocessthesedataraisesgreatchallenges.Sparsityisausefulmeasurementoftheeciency.Inthisdissertation,weaddresstheseproblemsbasedonexploringthesparsityofdatarepresentationsfromdataindependentanddatadependentperspectives. Intherstpartofthisdissertation,westudyimagerepresentationusingdataindependentapproaches.Forimagerepresentation,transformscanbeusedtoconvertoriginaldataintotransformcoecientstoeliminatethecorrelationamongoriginaldata.Wefocusonimageswithsingularities,whichareverydiculttorepresentbyconventionaltransforms. WeproposeRipplettransformtypeI(Ripplet-I),whichgeneralizesCurvelettransformbyintroducingsupportanddegreeparameters.CurvelettransformisjustaspecialcaseofRipplet-Itransformwithc=1,d=2.Ripplet-Itransformcanrepresentimageswithsingularitiesalongarbitrarycurveseciently.Ripplet-ItransformachievessparserrepresentationthanCurvelettransformanddemonstratessuperiorperformanceinapplicationssuchasimagecompressionandimagedenoising. FollowingthestrategyofRidgelettransform,weproposeRipplettransformtypeII(Ripplet-II)basedongeneralizedRadontransform.Ripplet-IItransformmaps 13

PAGE 14

Inthesecondpartofthisdissertation,westudyperformanceimprovingapproachesbyenhancingsparsityofimagerepresentationsusingtheinformationfromdata.Theseapproachesaredatadependentandvaryfromcasetocase.Toremoveartifactsintroducedbyavideocodec,weenhancethesparserepresentationoftruesignalstoremoveartifactsandpreservetruesignals;toimprovevideocodingeciency,weprovidebetterintrapredictionandadaptiveblockpatterntoenhancethesparsityofresiduals;toprovidevideoswithsmoothquality,weproposesparsitybasedratecontrolalgorithmforvideocodingwithconstraintsondistortionuctuations. 14

PAGE 15

Imageandvideodatausuallyhavealotofredundancies.Inmostcases,pixelintensitiesofimagesarehighlycorrelatedtotheirneighborpixels.Forvideosequences,theyareevencorrelatedtopixelsfromadjacentframes.Thecorrelationencouragesresearcherstoexploreapproachestoeliminatetheredundancy.Fromastatisticalperspective,eachimageorvideocorrespondstoapointinanextremelyhighdimensionalspace.Withhighprobabilities,imagesandvideosusuallybelongtoasubspacewithamuchlowerdimension.Thentheotherdimensionsusedtodescribethedatawouldberedundantandwillnotdistortthedataifdiscarded. Fordecades,researchershavebeendedicatedtoexploreapproachesthatcaneliminateredundantinformationandprovideecientrepresentationsofdata.Toquantitativelymeasuretheeciencyofrepresentation,weusetheterm\sparsity"[ 1 ].Sparsitydenotesthenumberofnon-zeroitemsinaset.Asparserepresentationtendstohavealargeportionofzeros.Sparserepresentationisequivalenttoecientrepresentation. Theapproachesintheliteraturefallintotwocategories:dataindependentanddatadependentapproaches.Theformerassumesthatdatafollowacertainmodeloracombinationofvariousmodels.Wecanrepresentdatausingtheparametersof 15

PAGE 16

2 ],whichstudieshowtorepresentafunctionasthesuperpositionofatomwaves,representsmostapproachesinthiscategory.AsimpleexampleistheFourierseries[ 3 ],whichrepresentsafunctionasthesuperpositionofsineandcosinefunctions.Forinstance,ifthetaskistorepresentasinefunctionwithinniteduration.Sincethesinefunctionisperiodic,onlythevaluesinsideoneperiodareenoughtorepresentthewholefunction.Still,therearelotsofvaluesjustinoneperiod.However,ifitisrepresentedasaFourierseries,onlyonecoecientisenoughtorepresentit. Atransformistheprocessthatmapstheoriginalfunctiontoasetofnumberscalledtransformcoecients.Transformcodingplaysaveryimportantroleinaudio,imageandvideocompressionapplications.Thehistoryoftransformcodinggoesbacktothe1950's[ 4 ].SignalsfromavocoderweremodeledasastochasticprocessandshowntobecompressibleusingKarhunen-Loevetransform(KLT)[ 5 ]madeupoftheeigenvectorsofthecorrelationmatrix.Karhunen-Loevetransformisthebesttransformwiththeleastmeansquarederrorinlinearapproximation.However,KLTissignaldependent.Itisnotusefulinrealapplicationsthatemployatransform.Itwillbeecienttochoosetransformswithxedbases.Variouskindsoftransformsarestudiedfordecades. Discretecosinetransform(DCT)[ 6 ]wasproposedasanapproximationtoKLTwiththeassumptionofrstorderGauss-Markovprocesswhenthecorrelationishigh(largerthan0.9).DCTwaslaterusedasthekeytransforminthepracticalimagecoders.DCTcombinedwithscalarquantizationandentropycodingwasstandardizedbythejointpictureexpertsgroup(JPEG)inthelate1980's.TheJPEGimagecompressionstandardserveswidelyallovertheworld[ 7 ].Inthe1990's,wavelettransformattractedalotattentioninacademic[ 8 ][ 9 ][ 10 ][ 11 ][ 12 ].Wavelettransformyieldsamultiresolutionrepresentationofsignalswhichconsistsofoctave-bandfrequencydecomposition.Thedecompositionprovidesgoodfrequencyselectivityatlowerfrequenciesandgoodtime 16

PAGE 17

13 ].ThenewstandardachievesbettercompressioneciencythanJPEG. AlthoughDCTandDWTaresuccessfulinimagecompression,howtorepresentasignalecientlyisstillabigchallenge.Smoothfunctionscanbesparselyrepresentedbymanytransforms.However,whentherearediscontinuities(singularitypoints)inasignal,itisnotaneasyjobtondagoodrepresentationforthesignal.SingularitieswillyieldinnitenumberofFourierseriestermsorinnitenumberofnonzeroFouriertransformcoecients[ 3 ][ 14 ][ 15 ].Ifwereconstructthesignalwithnitenumberofterms,Gibbsphenomenaappearsaroundsingularitypoints[ 16 ][ 17 ].Onlywhenalltermsareused,Gibbsphenomenacanberemovedfromthereconstructedsignal.Wavelettransformsolvedthisproblembyintroducingmultiscaleanalysis[ 18 ].Thesingularitypointscanbecapturedbychangingthescaleandpositionofwaveletfunctions.Wavelettransformcanresolveonedimensionalsingularities.Whenthedimensionofsignalsincreases,higherdimensionaltransformsusuallyconsistoftensorproductionoflowdimensionaltransforms.However,simpletensorproductionnolongerresolveshighdimensionalsingularities.Ridgelettransform[ 19 ][ 20 ],Curvelettransform[ 21 ][ 22 ]andShearlettransform[ 23 ][ 24 ]wereintroducedtosolvesingularitiesalongstraightlinesandC2curves.Nevertheless,thesolutionforasparserepresentationforimageswithsingularitiesalongarbitrarycurvesisstillanopenissue.Intherstpartofthisdissertation,weproposetwodataindependenttransformsnamedripplettransformtypeIandII.Ripplettransformsaimtoresolvetwodimensionalsingularitiesinimagesfromdierentperspectives.Thenewtransformsareabletoobtainsparserepresentationsofimageswithsingularities.Thesesparserepresentationscanassistfurtherprocessingofimagedatainvariousapplicationssuchasimagedenoising,imagecompressingandimageclassication. 17

PAGE 18

Inthesecondcategory,researchersndthatifwedonotcareabouttheexacttypeofmodelforthedata,wecanstillsucceedinsolvingalotofproblems.Infact,wecanmakeadvantageofsomeusefulobservationsthattherearealotofredundanciesinsideimageandvideodata.Imagedatahaveplentyofspatialredundancies.Videosequencesarehighlycorrelatedalongtimeinadditiontospatialredundancy.Basedonthehighcorrelation,wecanusedatathemselvestoassistdataprocessing.Forexample,invideocodingusingpartofthedatatoserveasapredictionorreference,mostofredundanciescanberemoved.Comparedtoprocessingthedatadirectly,itisusuallymucheasierandmoreecienttocompressthepredictionerrorsandpredictionmodes.Theactualalgorithmsvaryalotaccordingtoapplications. Wecanfurtherclassifyapproachesintotwocategoriesbasedontherangeofcorrelation:localandnon-local.Localmeansthatonlydatainalocalneighborhoodwillbeused.Theassumptionbehindisthatthecorrelationdecreasesasthedistanceincreasesandthecorrelationcanbeconsideredaszeroifthedistanceislargeenough,whichistrueinmostdata.ThiscategoryincludesLinearPredictiveCoding(LPC)[ 25 ],WarpedLinearPredictiveCoding(WLPC)[ 26 ][ 27 ],DierentialPulse-codemodulation(DPCM) 18

PAGE 19

28 ],adaptiveDPCM[ 29 ]inspeechprocessing,autoregressivemovingaverage(ARMA)model[ 30 ]instatistics,andlocalovercompleterepresentation[ 31 ]forimageprocessing.Non-localstandsforapproachessearchingforredundanciesoverthewholedataspace.Sometimestheassumptionthatlocalneighborsareusefulforprocessingisviolated.Toimprovetheperformance,acertaincriteriononcorrelationorsimilarityisintroducedtoexcludeuselessdataandenforcethesimilarity.Intrapredictionandmotioncompensationinvideocoding[ 32 ][ 33 ]usethemeansquarederrortondthebestmatch,whichwillimprovethecompressioneciency.Block-Matchingand3Dltering(BM3D)[ 34 ]isthestate-of-the-artimagedenoisingalgorithm.BM3Dseparatestruesignalandnoisebyenforcingsparserepresentationofsimilarblocksfromalloveranimage.Variations[ 35 ][ 36 ]basedonBM3Dalsoachievedsuperiorperformanceinseveralapplicationsinimageandvideoprocessing.Thesealgorithmsallenjoythebenetofexploringthesparsityofsignals. Inthesecondpartofthisdissertation,weaddressseveralpracticalproblemsrelatedtovideocoding.Weproposedsparsitybasedalgorithmstoimprovethecodingeciencyofmodernvideocodingsystems.Sparsepropertiesinvideocodingsystemsareexploredandusedwiselytoservefordierentaspectsinvideocodingsystems. Chapter 2 proposesanewtransformcalledRipplettransformtypeIbygeneralizingCurvelettransform.Weintroduceasupportcanddegreedinadditiontoscale,locationanddirectionparametersinthedenition,whilethecurvelettransformisjustaspecialcaseoftheripplettransformtypeIwithc=1andd=2.ThenewtransformcanadapttovarioussingularitiesalongarbitraryCdcurves.Theexibilityenablesthecapabilityofyieldingamoreecientrepresentationforimageswithsingularities.Inparticular,wedevelopforwardandbackwardRipplettransformtypeIforbothcontinuousanddiscretecases.Toevaluatetheperformanceofproposedtransform,RipplettransformtypeIis 19

PAGE 20

InChapter 3 ,weintroduceRipplettransformtypeIIbasedongeneralizedRadontransform.Thenewtransformconverts2Dsingularitiesinto1DsingularitiesthroughgeneralizedRadontransform.Then1Dwavelettransformisusedtoresolvethe1Dsingularities.BothcontinuousanddiscreteRipplettransformtypeIIaredened.OrthogonalripplettransformtypeIIisintroducedtofurtherimprovesparsityoftherepresentation.Propertiesofthenewtransformareexplored.Therotationinvariantpropertyachievesgoodperformanceinimageclassicationandretrievalfortextureimages. Chapter 2 andChapter 3 presenttransformbasedapproachestoachievesparserepresentations.Incertainapplications,directtransformmaynotyieldasparseenoughrepresentationofdata.Non-transformbasedpreprocessingtechniquescanbeused,sincethosetechniquesthemselvescanprovidesparserepresentations.Inthefollowingchapters,wewillpresentseveralsparsebasedtechniquesappliedinvideoprocessingandcompression. Chapter 4 presentsapproachesthatlearnfromdata.Inparticular,weproposeageneralframeworkthatexploresselfsimilaritiesinsidesignalstoenhancesparserepresentationsoftruesignals.Theframeworkuniesthreesparsitybaseddenoisingtechniquesandappliesthemforvideocompressionartifactremovalproblem.Wecompareandanalyzethethreetechniquesfromtheaspectsofoperationatom,transformdimensionality,andquantizationimpact.Thecomparisonandanalysiscanserveasaguidelineofapplyingsparsitybaseddenoisingtechniquestorelatedproblems.Theframeworkintroducesseveralfreedomineachcomponent,whichcanadaptstovariousapplicationsaslongassparsitybasedapproachcansolvetheproblem. InChapter 5 ,wepresentseveraltechniquestoimprovethecodingeciencyofvideocodecthroughsparsityenhancement.Weproposemoreaccuratepredictionthrough 20

PAGE 21

InChapter 6 ,weexploretherate-distortionbehaviorswhenintroducingnewconstraintsondistortionuctuations.Gammaratetheoryisproposedtoprovideguidancefordesigningratecontrolschemes.Thereisatradeobetweenrateanddistortionuctuation.Later,asparsitybasedratecontrolalgorithmisproposedtoimprovethecodingeciencywithlimitationsonbitrate.Variousrate-distortionmodelsareemployedtoderivetheoptimalbitallocationforcaseswithandwithoutconstraintsondistortionuctuations.Intheexperiments,weevaluatetheproposedalgorithmby(R,D,D)basedonthegammaratetheory.Experimentalresultsdemonstratethatthereisindeedatradeoinpracticalratecontrolschemes. Chapter 7 summarizesthewholedissertation. 21

PAGE 22

2 ]providesamethodologytorepresentsignalseciently.Specically,harmonicanalysisisintendedtoecientlyrepresentasignalbyaweightedsumofbasisfunctions;heretheweightsarecalledcoecients,andthemappingfromtheinputsignaltothecoecientsiscalledtransform.Inimageprocessing,Fouriertransformisusuallyused.However,Fouriertransformcanonlyprovideanecientrepresentationforsmoothimagesbutnotforimagesthatcontainedges.Edgesorboundariesofobjectscausediscontinuitiesorsingularitiesinimageintensity.Howtoecientlyrepresentsingularitiesinimagesposesagreatchallengetoharmonicanalysis.Itiswellknownthatone-dimensional(1D)singularitiesinafunction(whichhasnitedurationorisperiodic)destroythesparsityofFourierseriesrepresentationofthefunction,whichisknownasGibbsphenomenon.Incontrast,wavelettransformisabletoecientlyrepresentafunctionwith1Dsingularities[ 11 12 ].However,typicalwavelettransformisunabletoresolvetwo-dimensional(2D)singularitiesalongarbitrarilyshapedcurvessincetypical2Dwavelettransformisjustatensorproductoftwo1Dwavelettransforms,whichresolve1Dhorizontalandverticalsingularities,respectively. Toovercomethelimitationofwavelet,ridgelettransform[ 19 37 ]wasintroduced.Ridgelettransformcanresolve1Dsingularitiesalonganarbitrarydirection(includinghorizontalandverticaldirection).RidgelettransformprovidesinformationaboutorientationoflinearedgesinimagessinceitisbasedonRadontransform[ 38 ],whichiscapableofextractinglinesofarbitraryorientation. Sinceridgelettransformisnotabletoresolve2Dsingularities,CandesandDonohoproposedtherstgenerationcurvelettransformbasedonmulti-scaleridgelet[ 39 40 ]. 22

PAGE 23

41 42 ].Curvelettransformcanresolve2Dsingularitiesalongsmoothcurves.Curvelettransformusesaparabolicscalinglawtoachieveanisotropicdirectionality.Fromtheperspectiveofmicrolocalanalysis,theanisotropicpropertyofcurvelettransformguaranteesresolving2DsingularitiesalongC2curves[ 21 41 { 43 ].Similartocurvelet,contourlet[ 44 45 ]andbandlet[ 46 ]wereproposedtoresolve2Dsingularities. However,itisnotclearwhyparabolicscalingwaschosenforcurvelettoachieveanisotropicdirectionality.Regardingthis,wehavetwoquestions:Istheparabolicscalinglawoptimalforalltypesofboundaries?Ifnot,whatscalinglawwillbeoptimal?Toaddressthesetwoquestions,weintendtogeneralizethescalinglaw,whichresultsinanewtransformcalledripplettransformTypeI.Intherestofthischapter,weuseRipplet-Iinshort.Ripplet-Itransformgeneralizescurvelettransformbyaddingtwoparameters,i.e.,supportcanddegreed;hence,curvelettransformisjustaspecialcaseofripplet-Itransformwithc=1andd=2.Thenewparameters,i.e.,supportcanddegreed,provideripplet-Itransformwithanisotropycapabilityofrepresentingsingularitiesalongarbitrarilyshapedcurves.Theripplet-Itransformhasthefollowingcapabilities. 23

PAGE 24

Theremainderofthischapterisorganizedasbelow.InSection 2.2 ,wereviewthecontinuouscurvelettransforminspatialdomainandfrequencydomain,andanalyzetherelationsbetweenthem.InSection 2.3 ,wegeneralizethescalinglawofcurvelettodenerippletsandintroducecontinuousripplet-Itransformandinversecontinuousripplet-Itransform.Thenwediscussthediscretizationofripplet-ItransforminSection 2.4 .WeanalyzerippletfunctionsfromtheperspectiveofframesinSection 2.5 .Section 2.6 presentsexperimentalresultsthatdemonstratethegoodpropertiesofripplets. 39 ],[ 40 ] whereR=264cossinsincos375istherotationmatrix,whichrotatesradians.~xand~bare2Dvectors.a~00istheelementcurveletfunction. 24

PAGE 25

40 ]. where^a(r,!)istheFouriertransformofa~00inpolarcoordinatesystem.W(r)isa`radialwindow'andV(!)isan`angularwindow'.Thesetwowindowshavecompactsupportson[1=2,2]and[1,1],respectively.Theysatisfythefollowingadmissibilityconditions r=1,(2{3) Thesetwowindowspartitionthepolarfrequencydomaininto`wedges'showninFig. 2-1 Figure2-1. Thetilingofpolarfrequencydomain.Theshadowed`wedge'correspondstothefrequencytransformoftheelementfunction. FromEq.( 2{2 ),( 2{3 )and( 2{4 ),weknowthattheFouriertransformofcurveletfunctionhasacompactsupportinasmallregionwhichistheCartesianproductof 25

PAGE 26

2a,2 41 ],[ 42 ].Theparabolicscalingusedinthedenitionofcurveletfunctionsguaranteestheeectivelengthandwidthoftheregiontosatisfy:widthlength2andleadstoanisotropicbehaviorofcurvelets,whichmakescurvelettransformsuitableforresolvingarbitrarywavefront.Theparabolicscalingisthemostimportantpropertyofcurvelettransformandalsothekeydierencebetweenthecurveletandtherotatedwavelet. Givena2Dintegrablefunctionf(~x),thecontinuouscurvelettransformisdenedastheinnerproductoff(~x)andthecurveletfunction[ 41 ],[ 42 ],[ 22 ] whereC(a,~b,)arethecurveletcoecientsand Infact,thecurvelettransformonlycapturesthecharacteristicsofhighfrequencycomponentsoff(~x),sincethescaleparameteracannottakethevalueofinnity.Sothe`full'continuouscurvelettransformconsistsofne-scalecurvelettransformandcoarse-scaleisotropicwavelettransform.The`full'curvelettransformisinvertible.Wecanperfectlyreconstructtheinputfunctionbasedonitscurveletcoecients.Withthe`full'curvelettransform,theParsevalformulaholds[ 41 ],[ 42 ],[ 22 ].Iff(~x)isahigh-passfunction,itcanbereconstructedfromthecoecientsobtainedfromEq.( 2{5 )through 26

PAGE 27

2.2 ,weknowthatparabolicscalingusedincurveletsleadstoresolvingof2Dsingularities.However,thereisnoevidencetoshowthattheparabolicscalingistheoptimalscalinglaw.Wecandenethescalinglawinamorebroaderscopeandmoreexibleway.Theripplet-IfunctioncanbegeneratedfollowingthesamestrategyinEq.( 2{1 ) wherea~00(~x)istheripplet-IelementfunctionandR=264cossinsincos375istherotationmatrix.Wedenetheelementfunctionofripplet-Iinfrequencydomainas where^a(r,!)aretheFouriertransformofa~00(~x).W(r)isthe`radialwindow'on[1=2,2]andV(!)isthe`angularwindow'on[1,1].Theyalsoobeytheadmissibilityconditions( 2{3 )and( 2{4 ). Thesetoffunctionsfa~bgisdenedasripplet-Ifunctionsorrippletsforshort,becauseinspatialdomainthesefunctionshaveripple-likeshapes.cdeterminesthesupportofrippletsanddisdenedasthedegreeofripplets.Curveletisjustthespecialcaseofripplet-Iforc=1,d=2.Fig. 2-2 showsrippletswithdierentcanddierentdinspatialdomain.FromFig. 2-2 ,wecanseethatripplet-Ifunctionsdecayveryfastoutsidetheeectiveregion,whichisanellipsewiththemajoraxispointinginthedirectionof 27

PAGE 28

Ba=3,=3=16,c=1.5,d=2 Ca=4,=3=16,c=1,d=4 Da=4,=3=16,c=1.5,d=4 Ripplet-Ifunctionsinspatialdomainwithdierentdegreesandsupports,whicharealllocatedinthecenter,i.e.,b=0. 28

PAGE 29

Therippletsasthegeneralizationofcurvelethavealmostallthepropertiesofcurveletexcepttheparabolicscaling.Ripplet-Itransformcangetmulti-resolutionanalysisofdata.Foreachscale,rippletshavedierentcompactsupportssuchthatrippletscanlocalizethesingularitiesmoreaccurately.Ripplet-Ifunctionsarealsohighlydirectionaltocapturetheorientationsofsingularities. whereR(a,~b,)aretheripplet-Icoecients.Whentheripplet-Ifunctionintersectswithcurvesinimages,thecorrespondingcoecientswillhavelargemagnitude,andthecoecientsdecayrapidlyalongthedirectionofsingularityasa!0. Theripplet-ItransformdenedinEq.( 3{15 )hasthesameissuesascurvelettransformdoes,whichisthatthecontinuousripplet-Itransformcanonlycapturethe 29

PAGE 30

Nowwetransformimagesintoanotherdomainthatwecallrippletdomain.Thechallengesarisewhenwetrytoreconstructimagesfromripplet-Icoecients.Thetheoremsbelowintroducetheinverseripplet-Itransform. 2{3 )canberewrittenas a=Za0r0W(a)2da a=Za00W(ar)2da a=1(2{13) BasedontheadmissibilityconditionEq.( 2{3 ),wehave Foraspecialripplet-Ia~0(x),itsFouriertransformhasthepropertyasbelow. a3=Za00Z20W(ar)2V(a1 ddda a3=1 30

PAGE 31

Wehavea~b=a~0(xb),so Accordingtothepropertyofconvolution,wecanobtaintheFouriertransformofgas Using( 2{15 ),weget a3=ZZj^a0(!)j2^f(!)dda a3=^f(!)(2{19) Further, a3ej!xd!=ZZga,(x)dda a3=ZZZha~b,fia~bdadbd UsingPlancherelformulaandEq.( 2{15 ),wehave a3d!=Zj^f(!)j2d!=kfk2L2 31

PAGE 32

Sincetheissueofinterestisjustthene-scaleelementsorhighfrequencybands,thechoiceofthewavelettransformforthecoarse-scalecanbeveryexible.Similarly,Theorem 2 canbeeasilyprovedusingthesameargumentsin[ 42 ]. Thediscretizationofcontinuousripplet-Itransformisactuallybasedonthediscretizationoftheparametersofripplets.Forthescaleparametera,wesampleatdyadicintervals.Thepositionparameterbandrotationparameteraresampledatequalspacedintervals.a,~bandaresubstitutedwithdiscreteparametersaj,~bkandl,whichsatisfythataj=2j,~bk=[c2jk1,2j=dk2]Tandl=2 nc!).(2{24) 32

PAGE 33

Thediscreteripplet-ItransformofanMNimagef(n1,n2)willbeintheformof whereRj,~k,laretheripplet-Icoecients. Theimagecanbereconstructedthroughinversediscreteripplettransform Ifwehaveasetoffunctionsfk(x)=(x1k1=A,x2k2=B)g,thenwehaveg(x)=Xkhf,kik(x)

PAGE 34

f)(k)=Xkk(x)Z k(x)dx=Xkjhf,kij2XkZj^k(!)j2d! Fromthedenitionofripplets,alltranslatedversionofelementripplet-Iwillcoverallthebands.ThenwehavePkRj^k(!)j2d!=1.So Thetheoremcanbeprovedwiththetranslationparameter~bandl=0,basedonthelemmaabove.Forarbitraryl,wecanrotatethecoordinatetogeta~l=0,whereLemma 3 applies. 34

PAGE 35

2 ]ofimagesisadoptedasacommoncomparisonapproach.Supposewehaveorthonormalbasisfkgandthecorrespondingcoecientsck=.Thesecoecientsaresortedindescendingorderwithrespecttothemagnitude.Theindexkisdenedbyjc0jjc1jjc2jjckjjcn1jjcnj... Sinceripplet-Itransformprovidesatightframe,theconcentrationofripplet-IcoecientswillleadtomoreaccurateapproximationinNLA.Thefasterthecoecientsdecay,themorecompactenergywillbeallocatedtothefewerlargecoecients.Todemonstratethedecayrateofripplet-Itransformcoecients,werstsorttheripplet-IcoecientswithrespecttotheirmagnitudesandcomparethemtosortedwaveletcoecientsinFig. 2-3 .Itsuggeststhatthecoecientsofripplet-Itransformdecayfasterthanthoseofwavelettransform. Weusepeaksignal-to-noiseratio(PSNR)versusnumberofretainedcoecientstomeasurethequalityofreconstructedimages.PSNRisdenedas 35

PAGE 36

BBarbara Thecomparisonofcoecientsbetweenripplet-Itransformandwavelettransform. wherefmaxisthemaximumvalueofimageintensitiesandmseisthemeansquareerrorbetweenthereconstructedimage~fMNandoriginalonefMN 36

PAGE 37

2-4 .Multiplelinesandcurvesaresynthesizedwithdierentcoordinatestoprovidesingularitiesalongdierentcurves.ThetruncatedGaussianimage(Fig. 2-4A )presentsasmoothchangingpartaswellassingularityintroducedbytruncating. TheperformancecomparisonbetweenrippletswithxedsupportbutdierentdegreesisshowninFig. 2-5 .ToachievethesamePSNR,highdegreeripplet-Ineedsfewercoecientsthanlowdegreeripplet.Thereisabigperformancegapbetweendegree1ripplet-Iandothers.Forthesamenumberofcoecients,rippletwithdegree1achievesalmost2dBlowerthanothersinPSNR.Inotherwords,thedegree1ripplet-IneedsmorecoecientstoachievethesamePSNRasotherhighdegreeripplets.Degree1ripplet-Ihasisotropicbehaviorandisnotdirectionallysensitive,whereastheotherrippletsareanisotropicandcancapturethesingularitiesalongcurvesinthetestimages.Thegapbetweenperformancecurvesshowsthattheanisotropyhelpsalotinrepresenting2Dsingularitieseciently.Ripplet-Itransformswithdegree4anddegree3achievethesamehighestPSNRforthesamenumberofcoecients.Highdegreeripplet-Ihasmorecompactsupportandmoredirectionalsensitivity,whichcancapturemoreaccurateinformationaboutsingularities.Inourexperiments,whend>3,theperformanceisthesamewithd=3.Sincethediscreteimplementationofripplet-Iisbasedonthepowerof2,thedierenceinperformancebroughtbydegreedonlyappearsinnescales.Thehigherthedegreeis,thenerthescaleis.Usually,fornormalimagesizesuchas256256,d=3isthehighestdegreeusedinourexperiments. 37

PAGE 38

BMultiplelines CParaboliccurves DCubiccurves Testimages. ripplet-Itransform.ThewaveletusedinDWTis`9-7'biorthogonalwavelet[ 9 ],[ 47 ].Thediscreteripplet-Itransformusesripplet-Iwithc=1andd=3. TheresultsinFig. 2-6A showthatripplet-IcanachievethehighestPSNRwhenthenumberofretainedcoecientsislessthan5500.Meanwhile,ripplet-IcanprovidebettervisualqualitythanDWTandDCT.Wecanseethatripplet-Iavoidsthe`ringing'artifacts 38

PAGE 39

BMultiplelines CParaboliccurves DCubiccurves ComparingnonlinearapproximationperformanceofrippletswithxedsupportanddierentdegreescorrespondingtothetestimagesinFig. 2-4 ofwaveletasshowninFig. 2-6C andblockyartifactsofDCTasshowninFig. 2-6D .However,whenusingmorecoecients,ripplet-Iwillnolongerbethebest.Therefore,rippletshavethestrongcapabilityofrepresentingthestructureofimageswithfewernumberofcoecients. 39

PAGE 40

48 ]usedinJPEG2000[ 13 ]tocodethecoecients,inwhichanadaptivebinaryarithmeticcoderisusedforentropycoding[ 49 ]. Wecomparedtheperformanceofripplet,JPEGandJPEG2000,forthreecases,namely,thecroppedimagefrom`barbara',thetexture-richimagesintheUSCdatabase[ 50 ]andnaturalimages. 2-7 ,wecomparethesubjectivequalityoftheripplet-IbasedcodecandJPEG2000.Itisobviousthattherippletbasedcodecpreservesmoredetailsoftexture,comparedtoJPEG2000. 2-8 .AsshowninTable 2-1 ,theripplet-IbasedcodecachievesaslightlyhigherPSNRatlowbitrate,comparedtoJPEG2000. Table2-1. PSNRcomparisonofRipplet-IandJPEG2000at0.03125bpp Textureimages PSNRofRipplet-I(dB) PSNRofJPEG2000(dB) (a) 14.79 14.76 (b) 11.58 11.45 (c) 11.72 11.59 (d) 20.82 20.72 (e) 20.99 20.96 40

PAGE 41

2-9 andTable 2-2 indicatethattheripplet-IbasedcodecoutperformsJPEGbyupto3.3dBonaverageatthesamebit-rate.Theripplet-Iwithdegree3outperformscurvelet(degree2)asshowninFig. 2-9A 2-9B .InFig. 2-9C ,ripplet-Iwithdegree3canachievesimilarPSNRascurveletdoes,especiallyinlowbit-rate.ComparedtoJPEG2000,theripplet-Ibasedcodecachievesabout1dBlowerPSNRonaverageatthesamebit-rate.However,theripplet-IbasedcodeccanprovidebettersubjectivequalityasshowninFig. 2-10 .Whencompressionratioishigh,therearealotofwhitespotsaroundthefaceintheimagecodedbyJPEG2000inFig. 2-10B ,whilenoobviousartifactsappearintheimagecodedusingripplet-ItransforminFig. 2-10A .Moreover,theripplet-Ibasedcodeckeepsmoredetailsaroundthebeardin`mandrill'thanJPEG2000does. Table2-2. AveragePSNRgainofripplet-I(c=1,d=3)basedcodec,comparedtoJPEGandJPEG2000,respectively Barbara Mandrill Tiany AveragePSNRgain(dB)overJPEG 2.9 1.2 3.3 AveragePSNRgain(dB)overJPEG2000 -1.3 -0.8 -1.6 wheren(n1,n2)areindependent,identicallydistributedGaussianrandomvariableswithzeromeanandvariance2. Imagedenoisingalgorithmsvaryfromsimplethresholdingtocomplicatemodelbasedmethods.Sinceripplet-Itransformprovidesasparserepresentationofimages,simplehardthresholdinginripplettransformdomaincanremovemostofnoise.Inourexperiments,weusethefollowinghardthresholdingscheme:inthetransformdomain,acoecientwhosemagnitudeissmallerthanthepre-determinedthresholdissettozero;otherwise,thecoecientisunchanged.Thenwereconstructtheimagebyinversetransform.Inthe 41

PAGE 42

AsshowninFig. 2-11 ,theripplet-ItransformcanrestoretheedgesbetterthanDWT.Thereasonisthatripplettransformcanrepresenttheseedgesverysparsely,whereasnoisewillhavesmallvaluesinallripplet-Icoecients.Thenhardthresholdingcanremovethenoisewithlittledamagetotheimages.Ontheotherhand,wavelettransformcannotrepresentedgeswell;soedgesareblurredduetohardthresholding. Experimentalresultsindicatedthatripplet-Itransformcanprovideamoreecientrepresentationofimageswithsingularitiesalongsmoothcurves.Usingafewcoecients,ripplet-IcanoutperformDCTandwavelettransforminnonlinearapproximation.ItispromisingtocombinerippletsandothertransformssuchasDCTtorepresenttheentireimage,whichcontainsobjectboundariesandtextures.Ripplet-Itransformisusedtorepresentthestructureandtexture,whileDCTisusedtocompresssmoothparts.Thesparserepresentationofripplet-Itransformalsodemonstratespotentialsuccessinimagedenoising. 42

PAGE 43

BOriginalimage CRipplet-IbasedNLAwith5000largestcoecients,PSNR=31.13dB. DWaveletbasedNLAwith5000largestcoecients,PSNR=30.13dB. EDCTbasedNLAwith5000largestcoe-cients,PSNR=29.90dB. Performancecomparisons. 43

PAGE 44

BRipplet-Ic=1,d=3,PSNR=25.39dB CCurvelet(Ripplet-Ic=1,d=2),PSNR=24.12dB DJPEG2000,PSNR=22.37dB Thevisualqualitycomparisonbetweenripplet-IbasedimagecodecandJPEG2000forapatchcroppedfrom`barbara',whenbppisequalto0.3. 44

PAGE 45

B C D E Texture-richimagesusedinourexperiment 45

PAGE 46

BMandrill CTiany PSNRvs.bppforripplet-Ibasedimagecodec,JPEGandJPEG2000. 46

PAGE 47

BRipplet-I,PSNR=22.76dB CJPEG2000,PSNR=23.18dB Thevisualqualitycomparisonbetweenripplet-IbasedimagecodecandJPEG2000for`mandrill',whenbppisequalto0.25. 47

PAGE 48

Bnoisyimage,PSNR=24.61dB CRipplet-Itransform,PSNR=27.55dB DDWT,PSNR=27.01dB Ascale-updetailsofdenoisedtestimage`barbara'.Thestandardvarianceofnoiseis15. 48

PAGE 49

2 ]providesamethodologytorepresentsignalseciently.Specically,harmonicanalysisisintendedtoecientlyrepresentasignalbyaweightedsumofbasisfunctions;heretheweightsarecalledcoecients,andthemappingfromtheinputsignaltothecoecientsiscalledtransform.Inimageprocessing,Fouriertransformisusuallyused.However,Fouriertransformcanonlyprovideanecientrepresentationforsmoothimagesbutnotforimagesthatcontainedges.Edgesorboundariesofobjectscausediscontinuitiesorsingularitiesinimageintensity.Howtoecientlyrepresentsingularitiesinimagesposesagreatchallengetoharmonicanalysis.Itiswellknownthatone-dimensional(1D)singularitiesinafunction(whichhasnitedurationorisperiodic)destroythesparsityofFourierseriesrepresentationofthefunction,whichisknownasGibbsphenomenon.Incontrast,wavelettransformisabletoecientlyrepresentafunctionwith1Dsingularities[ 11 12 ].However,typicalwavelettransformisunabletoresolvetwo-dimensional(2D)singularitiesalongarbitrarilyshapedcurvessincetypical2Dwavelettransformisjustatensorproductoftwo1Dwavelettransforms,whichresolve1Dhorizontalandverticalsingularities,respectively. Toovercomethelimitationofwavelet,ridgelettransform[ 19 37 ]wasintroduced.Ridgelettransformcanresolve1Dsingularitiesalonganarbitrarydirection(includinghorizontalandverticaldirection).RidgelettransformprovidesinformationaboutorientationoflinearedgesinimagessinceitisbasedonRadontransform[ 38 ],whichiscapableofextractinglinesofarbitraryorientation. Sinceridgelettransformisnotabletoresolve2Dsingularities,CandesandDonohoproposedtherstgenerationcurvelettransformbasedonmulti-scaleridgelet[ 39 40 ]. 49

PAGE 50

41 42 ].Curvelettransformcanresolve2Dsingularitiesalongsmoothcurves.Curvelettransformusesaparabolicscalinglawtoachieveanisotropicdirectionality.Fromtheperspectiveofmicrolocalanalysis,theanisotropicpropertyofcurvelettransformguaranteesresolving2DsingularitiesalongC2curves[ 21 41 { 43 ].Similartocurvelet,contourlet[ 44 45 ]andbandlet[ 46 ]wereproposedtoresolve2Dsingularities. However,itisnotclearwhyparabolicscalingwaschosenforcurvelettoachieveanisotropicdirectionality.Toaddressthis,we[ 51 ]proposedanewtransformcalledripplettransformTypeI(ripplet-I),whichgeneralizesthescalinglaw.Specically,ripplet-Itransformgeneralizescurvelettransformbyaddingtwoparameters,i.e.,supportcanddegreed;hence,curvelettransformisjustaspecialcaseofripplet-Iwithc=1andd=2.Thenewparameters,i.e.,supportcanddegreed,provideripplet-Iwithanisotropycapabilityofrepresenting2Dsingularitiesalongarbitrarilyshapedcurves. Inspiredbythesuccessofridgelettransform,we[ 52 ]proposedanewtransformcalledripplettransformTypeII(ripplet-II),whichisbasedongeneralizedRadontransform[ 53 ][ 54 ].ThegeneralizedRadontransformconvertscurvestopoints.Itcreatespeakslocatedatthecorrespondingcurveparameters.Intuitively,ourripplet-IItransformconsistsoftwosteps:1)usegeneralizedRadontransformtoconvertsingularitiesalongcurvesintopointsingularitiesingeneralizedRadondomain;2)usewavelettransformtoresolvepointsingularitiesingeneralizedRadondomain.Inthischapter,weextendtheripplet-IItransforminmorecasesandorthogonalripplet-IItransformisproposed. Toelaborate,werstdenetheripplet-IIfunctionsanddevelopripplet-IItransformandorthogonalripplet-IItransforminthecontinuousspace.Thenthediscreteripplet-IItransformandorthogonalripplet-IItransformaredened.Ridgelettransformisjustaspecialcaseofripplet-IItransformwithdegree1.Propertiesofripplet-IItransformareexploredanddemonstratedbyexperimentalresults. 50

PAGE 51

Thereminderofthischapterisorganizedasbelow.Section 3.2 reviewsgeneralizedRadontransform.InSection 3.3 ,weintroduceripplet-IItransforminbothcontinuousanddiscretecases.Section 3.4 presentsthepropertiesofripplet-IItransform.ExperimentalresultsareshowninSection 3.5 andfollowedbyconclusioninSection 3.6 55 ].ClassicalRadontransformisdenedin2Dspaceastheintegralofaninput2Dfunctionoverstraightlines.Fora2Dintegrablereal-valuedfunctionf(x,y)where(x,y)2R2,classicalRadontransformoff(x,y)isdenedby Or,wecanconvertf(x,y)tof(,)inpolarcoordinatesystem,thenclassicalRadontransformcanbecalculatedby TheclassicalRadontransformisinvertible.TheoriginalfunctioncanbereconstructedbasedontheProjection-slicetheorem[ 14 ].ToextendtheclassicalRadontransform,researchersproposedgeneralizedRadontransform,whichisbasedonanintegralalongafamilyofcurves[ 53 ][ 54 ].Inthepolarsystemwithcoordinates(,),acurvecanbedenedby whererandarexed,andddenotesdegree.Ford=1andd=2,Eq.( 3{3 )representsstraightlineandparabolaasshowninFigure 3-1A and 3-1B ,respectively.Ford=1andd=2,Eq.( 3{3 )representscirclesthroughtheoriginandCardioidsasshownin 51

PAGE 52

B C D E F CurvesdenedbyEq.( 3{3 )inCartesiancoordinates.(a)d=1.(b)d=2.(c)d=3.(d)d=1.(e)d=2.(f)d=3. Figure 3-1D and 3-1E ,respectively.When00as`positivecurves'andd<0as`negativecurves'intherestofthischapter. ThegeneralizedRadontransformalongcurvescanbedenedinthepolarcoordinates(,)by 52

PAGE 53

3{4 )reducestotheclassicalRadontransforminEq.( 3{2 ).Ifd=2,Eq.( 3{4 )becomes[ 56 ] 2)p whereR[f(x,y)](r,)denotestheclassicalRadontransformthatmapsf(x,y)toR(r,),andisdenedinEq.( 3{1 );notethatf(,)underthepolarcoordinatesystemneedstobeconvertedtof(x,y)underCartesiancoordinatesystembeforecomputingEq.( 3{1 ).Eq.( 3{5 )showsthatford=2,thegeneralizedRadontransformcanbeimplementedviatheclassicalRadontransformwithappropriatesubstitutionsofvariables. Forthegeneralcase,i.e.,d2Z,thegeneralizedRadontransformcanbecomputedviaFourierseries[ 53 ][ 54 ].Letf(,)bea2Dfunctiondenedinpolarcoordinates(,)andGRd(r,)beitsgeneralizedRadontransform.AssumethattheFourierseriesforf(,)exists,i.e., where ThenthegeneralizedRadontransformcanbecomputedby whereford>0 whereTn()istheChebyshevpolynomialofdegreen,andfn()isgivenbyEq.( 3{7 ). 53

PAGE 54

3{6 ),( 3{7 ),( 3{8 ),( 3{9 )together,wehavethegeneralizedRadontransformofthefunctionfas Theinversetransformisdenedby dZ1Z20GRd(r,)einTnd((r=)1=d)((r=)2=d1)1=21=d rdrd]ein Fornegativecurves(i.e.d<0),theGeneralizedRadontransformis Theinversetransformfornegativecurvesisdenedby dZ0Z20GRd(r,)einTnd((r=)1=d)((r=)2=d1)1=21=d rdrd]ein 3.3.1ContinuousRipplet-IITransform 54

PAGE 55

3-2 .Ripplet-IIcanbescaled,translatedandrotatedaccordingtotheparametersa,b,.Notethatwhend=1,ripplet-IIreducestoridgeletasshowninFigure 3-3 ;i.e.,ridgelettransformisjustaspecialcaseofripplet-IItransformwithd=1. B C D Ripplet-IIfunctionsinCartesiancoordinates(x1,x2)(a)a=1,b=0,d=2and=0.(b)a=2,b=0,d=2and=0.(c)a=1,b=0.05,d=2and=0.(d)a=1,b=0,d=2and=30. 55

PAGE 56

B C D Ripplet-IIfunctionsinCartesiancoordinates(x1,x2)(a)a=1,b=0,d=1and=0.(b)a=2,b=0,d=1and=0.(c)a=1,b=0.05,d=1and=0.(d)a=1,b=0,d=1and=30. whereisthecomplexconjugateof,andf(,)isunderthepolarcoordinatesystem. Ripplet-IItransformhasthecapabilityofcapturingstructureinformationalongarbitrarycurvesbytuningthescale,location,orientation,anddegreeparameters.From 56

PAGE 57

3{15 ),wehave where(a)isduetoEq.( 3{15 );(b)isduetoEq.( 3{14 );(c)isduetoEq.( 3{4 )andGRd[f]isthegeneralizedRadontransform(GRT)offunctionf.Eq.( 3{16 )showsthatripplet-IItransformcanbeobtainedbytheinnerproductbetweenGRTand1Dwavelet,whichisthe1Dwavelettransform(WT)ofGRToffunctionf;i.e.,theripplet-IItransformoffunctionfcanbeobtainedbyrstcomputingGRToff,andthencomputing1DWToftheGRToffasbelow: wherethe1DWTiswithrespectto(w.r.t.)r. Indetails,ripplet-IItransformoffcanalsobeobtainedthrough dZ1Z20Z10Z11 b)einTnd((r=)1=d)((r=)2=d1)1=21=d rdadbdrd]ein 57

PAGE 58

3{17 ),theinverseoftheripplet-IItransformoffunctionfcanbeobtainedbyrstcomputinginverseWT(IWT)ofRf(a,b,d,)w.r.t.aandb,andthencomputinginverseGRT(IGRT)asbelow: whereIGRTcanbecomputedbythemethodinSection 3.2 ,Eq.( 3{11 ). 3{17 ),ripplet-IItransformcanbeimplementedasa1DwavelettransformalongtheradiusofthegeneralizedRadondomain.Ifweapply2DwavelettransformtothegeneralizedRadoncoecients,theadditionalwavelettransformalongangleholdsthepotentialofimprovingthesparsityoftransformcoecients.Wecallthenewextensionorthogonalripplet-IItransform. Inmathematics,orthogonalripplet-IItransformofafunctionf(,)inpolarcoordinatesisdenedby Similartoripplet-IItransform,orthogonalripplet-IItransformofthefunctionfcanbeobtainedbyrstcomputingGRToff,andthencomputing2DWToftheGRToffasbelow: Thereisnodirectionparameterinorthogonalripplet-IIcoecientsRorthf(a,b1,b2,d).Thismaynotprovideexplicitinformationaboutthedirectionsofcurves.However,due 58

PAGE 59

Orthogonalripplet-IItransformisalsoinvertible.Givenorthogonalripplet-IIcoecientsRorthf(a,b1,b2,d),wecanreconstructtheoriginalfunctionfthrough dZ1Z20Z10Z1Z11 b1)'(a b2)einTnd((r=)1=d)((r=)2=d1)1=21=d rdadb1db2drd]ein Reversingtheprocessin( 3{22 ),theinverseoftheorthogonalripplet-IItransformoffunctionfcanbeobtainedbyrstcomputinginverse2DWT(IWT)ofRorthf(a,b1,b2,d)w.r.t.a,b1andb2,andthencomputinginverseGRT(IGRT)asbelow: 3{17 ),discreteRipplet-IItransformoffunctionfcanbeobtainedbyrstcomputingdiscreteGRT(DGRT)off,andthencomputing1DdiscreteWT(DWT)oftheDGRToffasbelow: Thediscreteorthogonalripplet-IItransformfollowstheparadigmin( 3{22 )andisobtainedby Ifd=2,thereisasimplermethodtocomputerdiscreteripplet-IItransform,thedetailsofwhichwillbeelaboratedinSection 3.3.4 59

PAGE 60

B C D E F G H I Gaussianimageswithacurveedge.Toprow:originalimagef(x,y);Middlerow:Magnitudeof2DFouriertransform;Bottomrow:Magnitudeof2DFouriertransformaftersubstitutingthepolarcoordinate(r0,0)with(p 60

PAGE 61

56 ].Eq.( 3{5 )showsthatford=2,thegeneralizedRadontransformcanbeimplementedviatheclassicalRadontransformwithappropriatesubstitutionsofvariables.Hence,wecancomputediscreteripplet-IItransformviaEqs.( 3{25 )and( 3{5 ). 1. ConvertCartesiancoordinatestopolarcoordinates,i.e.,convertf(x,y)tof(,).Forf(,),substitute(,)with(02,20).Convertpolarcoordinates(0,0)toCartesiancoordinates(x,y),andobtainnewimagef1(x,y)byinterpolation,wherexandyareinteger-valued. 2. ApplyclassicalRadontransformtof1(x,y),resultinginR(r0,0).InfunctionR(r0,0),substitute(r0,0)with(p 3{5 ). 3. Apply1DwavelettransformtoGR2(r,)withrespecttor,andobtaintheripplet-IIcoecients. Toshowthesparsityofripplet-IItransformwithd=2,weplotFigure 3-4 .Asweknow,ridgelettransformofa2Dfunctionf(x,y)iscomputedby1)2DFouriertransformoff(x,y),2)convertingCartesiancoordinatesystemtothepolarcoordinatesystem(!0,0),3)1DinverseFouriertransformw.r.t.!0,resultingin(r0,0),4)1Dwavelettransformw.r.t.r0.Incontrast,ripplet-IItransformofa2Dfunctionf(x,y)iscomputedby1)2DFouriertransformoff1(x,y),2)convertingCartesiancoordinatesystemtothepolarcoordinatesystem(!0,0),3)1DinverseFouriertransformw.r.t.!0,resultingin(r0,0),4)substituting(r0,0)with(p 3-4 ,weobtainridgelettransformcoecients.Ifweapply1DwavelettransformtothebottomrowinFigure 3-4 ,weobtainripplet-IItransformcoecients.ItisobservedthattheFourier 61

PAGE 62

3-4 aresparserthanthoseinthemiddlerowofFigure 3-4 .Thisiswhyripplet-IItransformprovidessparsercoecientsthanridgelettransform.Inotherwords,substituting(r0,0)with(p 1. Apply1Dinversewavelettransformtoripplet-IIcoecientswithrespecttor,resultinginGR2(r,). 2. InfunctionGR2(r,)=(2p 3. ApplyclassicalinverseRadontransformtoR(r0,0),resultinginf1(x,y). 4. Forf1(x,y),convertCartesiancoordinates(x,y)topolarcoordinates(0,0),resultinginf1(0,0).Forfunctionf1(0,0),substitute(0,0)with(p 62

PAGE 63

BParabola Ripplet-IIwithdierentdegrees. InFigure 3-5 ,ripplet-IIrepresentationofimageswithdierentdegreesarepresented.Itcanbeobservedthatdierentdegreespresentdierentlevelsofsparsity. 63

PAGE 64

3-6 showsthemagnitudeoftransformcoecientsinadecreasingorderforwavelet,ridgelet,ripplet-IIandorthogonalripplet-IItransforms;themagnitudeofcoecientsofeachtransformisnormalizedbythelargestcoecientofthecorrespondingtransform.Itcanbeobservedthatripplet-IIhasthefastestdecayincoecients,comparedtowaveletandridgelet.Thisisthereasonwhyripplet-IItransformcanprovidesparserrepresentationforimageswithedgesthanwaveletandridgelet.InFigure 3-6 ,orthogonalripplet-IIdemonstratesfasterdecaythanripplet-II,whichindicatesthatorthogonalripplet-IItransformcanprovidesparserrepresentationoffunctionsthanripplet-IIasstatedinSection 3.3.2 Besidestheaforementionedproperties,ripplet-IItransformcanproviderotationinvariance.Weshowthisasbelow.Ifwehaveanimagef1(x,y)aswellasitsrotatedversionf2(x,y)rotatedbyanangle,i.e., Inthepolarcoordinatesystem,wehave Soripplet-IItransformoff2is Applying1DFouriertransformonbothsidesofEq.( 3{29 )withrespectto,wehave 64

PAGE 65

BBarbara Comparisonofcoecientdecayingbetweenwavelet,ridgelet,ripplet-IIandorthogonalripplet-IIwithd=2. 65

PAGE 66

57 ].Thetexturevolumeconsistsof2sub-databases,allofwhichcontainmonochrometextureimages. 58 ]inthetexturevolumecontainsasetofrotatedtextures.Eachimageinthesub-databaseisofsize512152pixels.Thesub-databasecontainsatotalof13texturesasshowninFigure 3-7 andeachtexturehas7versions,whicharerotatedby0,30,60,90,120,150,and200.Hence,thesub-databasecontainsatotalof137=91images.Next,wedescribethefeatureextractionandclassicationalgorithmsusedintheexperiments. Table3-1. Informationextractedfromdata Notation Equation Description C1 C2 C3 C4 66

PAGE 67

Texturesusedintextureclassication. geometryinformationfromtheoriginalimageasshowninTable 3-1 [ 59 ].Therotationinvariantpropertyofripplet-IItransformguaranteesthatrotatedimageshavealmostthesamefeaturevectorasthatoftheoriginalimage.ThestatisticalinformationfromTable 3-1 canalsoproviderotationinvariantpropertyforwavelettransformapproach. 60 ]tothistransformcoecientmatrixandobtaineigenvalues/eigenvectorsofthematrix.PCAprovidesatransformationmatrix,whichconsistsofprinciplecomponents(normalizedeigenvectors);wemultiplythetransformationmatrixandatransformcoecientvector,resultinginafeaturevector.Wechoosethefeature 67

PAGE 68

Texturesrotatedwithdierentangles. dimensionsthatcorrespondstotheNDprincipalcomponentswhoseeigenvaluesarelargest.Hence,theresultingfeaturevectorisND-dimensional. Wetestfourtypesoftransforms,i.e.,ridgelet,wavelet,ripplet-IIandorthogonalripplet-IItransform.TextureclassicationperformancesarelistedinTable 3-2 andTable 3-3 correspondingtofeatureextraction1andfeatureextraction2,respectively.Table 3-2 showsthatripplet-IItransformachieveslowererrorratethanwavelettransformunderallfeaturevectorlengthstestedandunderC1,C3,andC4;ripplet-IItransform 68

PAGE 69

Errorrateunderdierenttransformsusingfeatureextraction1 FeaturedimensionND Ridgelet Ripplet-II Orth-Ripplet-II 1 0.4885 0.9341 0.2802 0.1099 4 0.0441 0.8462 0.0019 0.1319 16 0.0187 0.7912 0 0.0879 64 0.0057 0.8462 0 0.0879 256 0.0024 0.9121 0 0.0879 C2 1 0.2826 0.4286 0.1715 0.1099 4 0.0929 0.3516 0.0728 0.1099 16 0.0536 0.4176 0.0608 0.1319 64 0.0374 0.5385 0.0580 0.1429 256 0.0172 0.5824 0.0656 0.1099 C3 1 0.2883 0.4286 0.1844 0.1209 4 0.0072 0.3077 0.0024 0.1319 16 0.0010 0.3846 0.0010 0.1099 64 0.0010 0.4835 0.0010 0.1099 256 0.0019 0.5055 0 0.0879 C4 1 0.6624 0.8352 0.4061 0.5824 4 0.8841 0.8352 0.1049 0.3407 16 0.9176 0.7143 0.0532 0.2967 64 0.9746 0.6923 0.2055 0.2967 256 0.9909 0.6374 0.1173 0.2857 3-2 showsthatripplet-IItransformoutperformsothertransformsinalmostallcasesbyachievinglowesterrorrate.However,orthogonalripplet-IItransformisworsethanripplet-IIandwavelet.Table 3-3 showsthatripplet-IItransformachieveslowererrorratethanridgelet,waveletandorthogonalripplet-IItransformunderallfeaturedimensionstested.Thereasonwhyripplet-IItransformachievesthebestclassicationperformanceistwo-folded.First,ripplet-IItransformisabletoprovidesparserfeaturevectorsthanridgeletandwavelettransform.Second,therotationinvariantpropertyofripplet-IItransformguaranteesthatrotatedimageshavealmostthesamefeaturevectorasthatoftheoriginalimage. 69

PAGE 70

Errorrateunderdierenttransformsusingfeatureextraction2 FeaturedimensionND Wavelet Ripplet-II Orth-Ripplet-II 1 0.3908 0.2639 0.1341 0.1758 2 0.2648 0.0752 0.0172 0.1538 4 0.2261 0.0527 0.0029 0.1538 8 0.1676 0.0350 0.0005 0.1319 16 0.1058 0.0177 0 0.1319 32 0.0704 0.0105 0 0.1319 Asub-databasenamedTextures[ 50 ]inthetexturevolumecontains58images,eachofwhichcontainsonetypeoftexture.Amongthe58images,33imagesareofsize512152pixelsand25imagesareofsize10241024pixels.Totesttherotation-invariantcapabilityofdierenttransforms,weneedtocreaterotatedversionsoftheimagesinthesub-database.Toachievethis,werstrotateatextureimagebyanglesfrom0to350withastepsize10;thenwecropapatchofsize128128pixelsfromthecenterregionoftherotatedimage.Bydoingso,weobtain5836=2088images. Imageretrievalisdoneinthefollowingsteps.First,atestimageisgivenasaquerytotheimageretrievalsystem.Second,applyafeatureextractionalgorithm(whichisthesameasthatinSection 3.5.1 )tothetestimage,andobtainafeaturevector.Third,applykNNclassier(wherek=35)tothetestfeaturevectorandthe2088imagesserveastrainingsamplesforthekNNclassier;thedistancemeasureusedinkNNisND-dimensionalEuclideandistance.Assumethatinthekimages(whichareoutputofthekNNclassier),Nrimagesarerotatedversionsofthetestimage.WecallNr=kasretrievalrate,whichrepresentsthesuccessrateofimageretrieval.Wetestfourtypesoftransforms,i.e.,ridgelet,wavelet,ripplet-IIandorthogonalripplet-IItransform.Table 3-4 and 3-5 listaverageretrievalrateusingdierentfeatureextractionapproaches.Table 3-4 showsthatripplet-IIandorthogonalripplet-IItransformshavehigheraverageretrievalratethan 70

PAGE 71

3-5 showsthatripplet-IItransformachieveshigheraverageretrievalratethanridgeletandwavelettransformunderallfeaturedimensionstested.Orthogonalripplet-IItransformoutperformswaveletandridgelettransform. Table3-4. Averageretrievalrateunderdierenttransformsusingfeatureextraction1 FeaturedimensionND Ridgelet Ripplet-II Orth-Ripplet-II 1 0.4174 0.4521 0.6312 0.8135 4 0.6669 0.6544 0.9948 0.8540 16 0.7034 0.6869 0.9959 0.9409 64 0.7016 0.6632 0.9927 0.9425 256 0.7382 0.6094 0.9876 0.9695 C2 1 0.6394 0.4489 0.7653 0.7659 4 0.7182 0.6438 0.7883 0.7692 16 0.7591 0.6079 0.7884 0.8386 64 0.7885 0.5237 0.7894 0.8967 256 0.8070 0.4536 0.7905 0.8876 C3 1 0.6384 0.3344 0.7581 0.7536 4 0.8511 0.5968 0.9299 0.7870 16 0.8384 0.5682 0.9413 0.9220 64 0.7863 0.5290 0.9485 0.9632 256 0.7532 0.4923 0.8701 0.7731 C4 1 0.1754 0.1810 0.5551 0.2421 4 0.0911 0.1529 0.7660 0.3226 16 0.0539 0.1203 0.7687 0.3963 64 0.0278 0.0883 0.4248 0.3810 256 0.0190 0.0507 0.5305 0.3749 Averageretrievalrateunderdierenttransformsusingfeatureextraction2 FeatureDimensionND Wavelet Ripplet-II Orth-Ripplet-II 1 0.5013 0.6426 0.803 0.8016 2 0.5773 0.8513 0.9495 0.8940 4 0.5921 0.8898 0.9831 0.9343 8 0.6343 0.9154 0.986 0.9831 16 0.7054 0.941 0.9869 0.9866 71

PAGE 72

3.5.1 ,thedatabaseusedherehassmallerin-classdistance.Experimentalresultsshowthatripplet-IIworkswellforbothlargeandsmallin-classdistances.Orthogonalripplet-IItransformonlyworksforsmallin-classdistancecase. 72

PAGE 73

4.1 .InH.264/AVC,thestate-of-the-artvideocodingstandard,anin-loopdeblockinglter[ 61 ]isadoptedtoremovediscontinuitiesatblockboundaries.Alow-passlterisadaptivelychosenaccordingtothecharacteristicoftheblockboundaries.Thedeblockinglterremovesmostofblockyeectsandimprovesthecodeeciency.Nevertheless,theothertypesofartifactsstillremain,becausethelteronlyaectspixelsneartheblockboundaries. Toremovemoregeneralcompressionartifacts,somesparsity-baseddenoisinglteringwereproposed.[ 62 ]and[ 63 ]proposedlocalapproachesthatadapttothenonstationaryofthecontentbyapplyingasparseimagemodeltothelocalneighborhood.Multiplerepresentationsofapixelareachievedbyanovercompletesetoftransformsandathresholdingoperation.Then,afusionprocessisappliedtoobtainthenaldenoisedpixel.AlthoughtheseltersachievesuperiorobjectiveandsubjectivequalitiescomparedtotheH.264deblockinglter,thetransformsetshouldbecarefullyselectedbasedonthecharacteristicsoflocalneighborhoodinordertoecientlydecoupletruesignalandnoise.Anotherclassofsparsity-baseddenoisingapproachestrytoexploittheexistingnon-localself-similaritieswithinapictureorvideo.In[ 34 ],theBlock-Matchingand3Dltering(BM3D)non-localapproachisproposed,whichachievesstate-of-the-artperformancein 73

PAGE 74

B C D Codingartifacts:(A)Edgedistortion.(B)Ringingeects.(C)Texturedistortion.(D)Blockyartifacts. 74

PAGE 75

35 ],VideodenoisingbyBlock-Matchingand3Dltering(VBM3D)isproposedtoextendBM3Dtovideodenoisingbyintroducingmultiplepicturesinthesearchregion.Thework[ 36 ]wentbeyondVBM3Dbyextendingtheoperationatomfroma2Dblocktoamoregeneral3Dpatch.Transformdimensionalityisalsoincreasedto4D.Thisextensionconsiderslocalmotionfeaturesduringthepatchclusteringprocess.Finally,in[ 36 ],thewholedenoisingalgorithmisformulatedunderavariationbayesianframework.AvariationalEMalgorithmisappliedtoiterativelyrenethedenoisingresultandpatchclusters.Thisnewapproachachievesgoodperformanceindenoisingandothervideoprocessingapplications.Inthischapter,weputtheabovethreedenoisingapproaches(2Dlocal[ 63 ],3Dnonlocal[ 34 ],and4Dnonlocal[ 36 ])intoacommonvideocompressionartifactremovalapplication.ThethreedenoisingapproachesareappliedtoprocesstheH.264/AVCcompressedvideosequencesasapost-lter.First,weunifythemintoade-artifactingframework.Afterthat,ourgoalistoinvestigatetheperformanceoftheapproachesunderthiscommonframework. Theremainderofthischapterisorganizedasfollow.Section 4.2 introducestheframeworkwesetupforthisinvestigation.Section 4.3 describesthede-artifactingalgorithmsimplementedinthisframework.Section 4.4 analyzesthecomplexityofproposedframework.PerformancecomparisonsandanalysisarepresentedinSection 4.5 75

PAGE 76

36 ]exploitsaniterativeEMalgorithmtoenhancethepatchclusteringanddenoisingresultsgivingbetterresults.However,thecomplexityistoohighforvideocompression,especiallyforpotentialrealtimeapplications.Indierentalgorithms,dierenttransformsareused,soitishardtocometofaircomparisons.Inourframework,ononehand,weexpectthatthede-artifactingmayreuseexistingresourcesoftheH.264/AVCcodec.Ontheotherhand,wehavetoalignthealgorithmsasmuchaspossibleforafaircomparison.Therefore,aunicationeortisneeded.Theunicationisdonewithoutchangingthecoreideaoftheoriginalalgorithms,whileputtingtheminthesamebaseline.Inthenextsection,weintroducetheindividualcorecomponentsofde-artifactingalgorithmsthatwehaveimplementedbasedonboththeoriginalcounterpartsandtheaboveunicationcriteria. 4-2 .Withthepatchdenition,avideosequence 76

PAGE 77

2Dand3Dpatchesinavideosequence. Figure4-3. Flowchartoftheproposedframework.2Dpatchesareusedfordemonstration. isassumedtobecomposedofmanyoverlappingpatches,either2Dor3D.Theproposedde-artifactingframeworkisshowninFigure 4-3 .Thekeycomponentsincludesimilarpatchsetformulation(similarpatchsearchandorganization),sparsityenforcement(transformandltering),andmultiplehypothesisfusion(weightedaveraging),whichcorrespondtothecorecomponentsofthethreeapproaches.Weelaborateontheminthefollowingsubsections. 77

PAGE 78

62 ]and[ 63 ],wecanassumethepre-xedsimilarpatchset(i.e.,neighboringpatches)isused.Itdoesnothurttounifyitintothegeneralsimilarpatchsetformulationprocess.Thesimilarpatchsetformulationisexplicitandveryimportantfortheworksin[ 34 ]and[ 36 ].In[ 62 ],[ 63 ],and[ 34 ],2Dpatchistheatomofthealgorithm,whichisdierentfromthatin[ 36 ],wherea3Dpatchisused.Wedenethefollowinggeneralsimilaritymeasurementforgeneratingthesimilarpatchset: whereS(P0)isthesimilarpatchsetofapatchP0,Pisthewholepatchsetinthesearchrange,PidenotestheithpatchinP,d(Pi,P0)isthedistancebetweenP0andPi,andmatchisthemaximumdistance.Ifthedistanceissmallerthanmatch,thentwopatchesareconsidered"similar".Ifd(,)isdenedasthephysicallocationoftwopatchesinavideolattice,thepre-xedsimilarpatchsetin[ 62 ]and[ 63 ]isuniedintoEq.( 4{1 ).Forthealgorithmsin[ 34 ]and[ 36 ],wedened(,)asEuclideanintensitydistanceoftwopatches,i.e.,d(Pi,P0)=jPiP0j22.Inourframework,thesearchrangemaycoverboththespatialandtemporaldomaininordertofullyexploitthevideosignalspatio-temporalredundancies. 62 ]and[ 63 ],thetransformisdirectlyappliedtoeachpatchinthesimilarpatchset.Inthetwootherapproaches,patchesinthesimilarpatchsetaresortedandpackedtogenerateahigherdimensionalarraysignal.ThisprocessisdemonstratedinFigure 4-4 .Patchesinthesimilarpatchsetaresortedaccordingtothesimilaritymeasureinascendingorder.Thenthesortedpatchesarestackedtoahigherdimensionarray.Becauseofthesimilaritysharedbythesepatches,thepackedarraysignalexhibitsnearstationarycharacteristics 78

PAGE 79

Patchsortingandpacking.Thedegreeofgrayinpatchesindicatesthesimilarity.Thelowerthedegreeis,themoresimilarthepathistothereferencepatch. alongthepackingdimension.Thiskindofarraysignalisveryeasytosparselydecomposebyanystandardbasis. Tosuppressthenoiseintransformdomain,variousapproachescanbeusedsuchasthresholdingorltering[ 64 ]. 79

PAGE 80

Unlikehardthresholding,softthresholdingalsochangesthosevaluesgreaterthangiventhresholdbysubtractingthethresholdvaluefromthem. Wienerlter[ 65 ]isusedtolteroutnoiseinacorruptedsignalbasedonstatisticalapproach.TheoutputofWienerlteriswrittenas whereYisanoisytransformcoecient,^Yisalteredcoecient,andisestimatedstandarddeviationofthenoise(orartifacts). 80

PAGE 81

4{5 ).Interestedreadercanreferto[ 36 ]forthedetailedderivation. whereyisthenoisypixel,xisthecleanversionofy,Piistheithpatchthathasoverlapwithpixelx.E[]isexpectationoperation.Weightingcoecientsaimaybeestimatedbythesparsitypriorsasin whereNnz(i)isthenumberofnon-zerocoecientsofthetransformedpatchPi.Theweightsareestimatedbythesparsitypriorsoftransformedpatch.Weassumethecompressionnoisehaslittlesparsecharacteristics,unlikethetrueimagesignal.Wegivemoreweighttothepatchesthataremoresparsebecauseitishighlyprobablethattheyprovideamoreaccurateestimate. 4.3.2 ,weproposedtheuseofWienerlterinthetransformdomain.AsshowninEq.( 4{4 ),weneedtoknow,thestandarddeviationofnoise.Inourwork,isestimatedatencoderandsentassideinformationtothedecoder.maybeinsertedinthesliceheaderorSEI(SupplementalEnhancementInformation).TheunieddenoisingalgorithmsmaybeincorporatedintoavideocodingschemeasapostprocessingtoolasshowninFigure 4-5 orasanin-looplterasshowninFigure 4-6 .Inthischapter,weincorporatethemasapostprocessinglteringinordertochecktheirartifactremovalperformance.Wecomparethede-artifactingperformanceinthefollowingsectionstakingintoaccountseveralaspects. 81

PAGE 82

Deartifactinglterasapost-processingtoolinavideoencoder. Figure4-6. Deartifactinglterasanin-looptoolinavideoencoder. 82

PAGE 83

Therstaddendinthebracketisthecomputationforthedistancecalculationofeachpatch.ThesecondoneisthecosttoretainonlyNspatcheswithsmallestdistances. Theaveragetransformcomplexityforeachpixelis Therstthreetermsinthenumeratorarethetransformcomplexityofapatch.Thelasttermisthecostoftransformingalongpatchesintheset. From( 4{7 )and( 4{9 ),wecantellthatthecomplexityincreasesifweincreasethedimensionofpatchesortransforms.Sotradeoshouldbemadebetweentheperformanceandcomplexity. 83

PAGE 84

carphone IntraOnly IPPP IntraOnly IPPP QP 2D 3D 2D 3D 2D 3D 2D 3D 22 1.24 0.58 27 1.04 0.69 32 0.94 0.73 37 0.84 0.65 Performance(PSNRgainindB)of2Dand3Dpatchesinvideoswithdierentmotioncharacteristics. frames.TheresultsarecomparedtothoseofH.264/AVCdeblockinglter.OurtestcongurationsareIntraonlyandIPPPusingQP=f22,27,32,37gonHighProle.ThetestvideosequencescoverfromQCIFtoHD(720p)resolutions.Inthesimulations,2isestimatedbythevarianceofthereconstructionerroratencoder.Weanalyzethesimulationresultsfromthreepointsofviewtoclarifytheadvantagesanddisadvantagesofthefeaturesofeachalgorithm. 4-1 .Weobservethata3DpatchdoesgivemuchmorePSNRimprovementwithrespecttoa2Dpatchinslowandsmoothmotionsequences(e.g.,container).However,whenmotionbecomescomplicated(e.g.thejitteringincarphone),s3Dpatchdoesnotwork.Asimilarphenomenonisobservedinmobile(smoothandslowmotion)andbus(complicateandfastmotion). 84

PAGE 85

4.3 ,patchesinasimilarpatchsetcanbepackedtogenerateahigherdimensionaldataarray.Thisdataarraycanbeeasilydecomposedbystandardbasissinceitisnearstationaryalongthepackingdimension.Inthissection,weseethatpackingtohigherdimensiondoesprovidemoregaininmostofthecases.Inthissimulation,weuse2Dpatchandkeepthesimilarpatchsetthesameforfaircomparison.Inonecase,wepackthepatchesinthesimilarpatchsetintoa3Ddataarrayanduse3DDCTtransform.Intheothercase,weapply2DDCTtothepatchesinthesimilarpatchsetindividuallywithoutpacking.Figure 4-7 showstheaveragePSNRgainofvetypicalHDsequences(Crew,Night,Shuttle start,City,Big ships)fortheintraonlyandIPPPcases.Basedontheseresults,wendthatdimensionincrementimprovesalottheperformanceinalltestedQPranges. 4-8 ,2Dlocal,3Dnonlocal,and4Dnonlocalcorrespondtotheuniedversionofalgorithmsin[ 63 ],[ 34 ],and[ 36 ],respectively.Figure 4-8 showstheaverageresultsoftheaboveveHDsequencescodedbyintraonlyandIPPPcodingformat.2Dlocalapproachcanonlyobtainminorgainsinceitonlyexploitsspatialinformation. Thetwonon-localapproachesaremoresensitivetoquantization,sincelargequantizationincursinmoreartifactsthatdegradetheaccuracyofthesimilarpatchset.InIPPPcase,withQPincreasing,therearemoreandmoreblocksthatarepredictedbyskipmode,whichmeansthatthepredictionblockisexactlythesameasthereconstructedblock.Asaconsequence,theseblockspredictedbyskipmodeareenhancingtheartifacts, 85

PAGE 86

4-9 showsdetailedcropsfromcodedHDsequences.Itisobviousthatvisualqualityisimprovedbyproposedlter.Indetails,theedgesofbuildingsismoreregulatethanH.264.The`ringing'eectsaroundtheblackdotsonthewallaresuppressedmostlyaswell. 86

PAGE 87

Performanceofdierenttransformdimensions. 87

PAGE 88

Performancefordierentquantizationparameters. 88

PAGE 89

Visualcomparisonofdetailedcrops.LeftcolumniscodedbyH.264.Rightcolumnislteredbyproposedlter. 89

PAGE 90

66 ],MPEG-1[ 67 ],MPEG-2/H.262[ 68 ],H.263[ 69 ]andMPEG-4[ 70 ]emergethroughout1990s.Intheearly21stcentury,MPEGandVCEGformedaJointVideoTeam(JVT)todevelopnewandpowerfulvideocodingstandard,whichledtothestate-of-the-artvideocodingstandard,H.264[ 71 ][ 72 ][ 73 ].Asahybridvideocodingscheme,H.264incorporatesalotofadvancedvideocodingtechniquesincludingtransform,motioncompensation,quantizationandentropycoding.TheowchartinFigure 5-1 showstheprocessofatypicalH.264encoder.First,aframefromthevideosequenceispartitionedtononoverlapedblocksofsize16x16,alsocalledamacroblock(MB).Macroblockisthebasicprocessingunitinavideocodec.Second,apredictedversionofcurrentMBisyieldedthrougheitherintrapredictionormotioncompensation.ThepredictionerrorwhichisthedierencebetweencurrentMBandpredictedMBistheinputofDCT.Third,thecoecientsofDCTarequantizedbyaquantizer.Finally,quantizedcoecientsareencodedbyanentropycodeandbinarybitstreamwillbethenaloutput. 90

PAGE 91

Hybridvideocodingdiagram. NewfeaturesintroducedintoH.264overotherstandardsare 5-2 .Each8x8blockisfurtherdividedinto8x4,4x8and4x4blocks.Blockbasedmotionestimationisconductedandblockpatternisdeterminedastheonewithminimumpredictionerror.LargeconsistentMBtendstochooselargeblocksize.Complicatemotionarealeadstosmallblocksize. 5-3 and4modesfor16x16blocks. 74 ]. 61 ]. 91

PAGE 92

Figure5-2. PartitionofMB. Whenevaluatingtheperformanceofavideocodec,wecannotsimplycomparethesinglemeasurementsuchasPeakSignaltoNoiseRatio(PSNR)andbitrate.Itismorereasonabletoconsiderthembothatthesametime,whichisalsowhatrate-distortiontheoryaboutis.IfhigherPSNRisdesired,morebitsareneeded.Alltheparametersofcodingtechniquesaredeterminedinarate-distortionoptimized(RDO)fashion,whichmeanstradeoismadebetweenPSNRandbitrate. whereMistheheightofvideo,Nisthewidthofvideo,Iistheoriginalsignaland^Iisthereconstructedsignal. 92

PAGE 93

Intrapredictiondirectionsfor4x4blocks. IntheschemeofH.264,onlypredictionerrorsarefedtotransformandfollowingproceduresforI,PandBframes.Fromtheperspectiveofsparsity,mostnewtechniquescontributingtothesuccessofH.264yieldsparserepresentationinthetransformdomain.Itispromisinginimprovingperformanceifmoresparsecoecientsareobtainedfromnewtechniques. 5.2.1HeterogeneousBlockPattern 73 ],variousblocksizeonlyappearsinmotioncompensation,whichmeanstheintracodedframe(I-frame)onlyhas16x16,four8x8blocksor164x4blocksforeachmacroblockasshowninFigure 5-4A .ForMBswithatcontent,16x16blocksizeisused;ForMBswithdetails,4x4blocksizeisused.However,itisnotwisetouseuniformpatternforthewholemacroblock(MB).ForeachMB,thehomogeneousblockpatterncannotcapturethevariationofcontentadaptively.Weproposeaheterogeneous 93

PAGE 94

5-4 comparesthenalblockpatternsofthesameframeencodedbyH.264andproposedalgorithm.TheresultsfromproposedalgorithmshowninFigure 5-4B demonstratebetteradaptivityincontentstructures.Foratareas,largeblocksize(8x8or16x16)ischosen;Fordetails,4x4blocksareused.ForMBscoveringobjectsboundaries,wecanobservecomplicatedblockpatterns. 75 ],theocialreferencesoftwareforH.264.WeimplementproposedquadtreestructurepatternasadditionalmodetotheexistingmodesinH.264.Todistinguishfromexistingmodes,weneedsomeagandsystem,whichiscalledanoverhead.Weprovideasimplemethodtoencodetheoverhead.WeaddoneagtoindicatetheadaptivemodeaswellasseveralbitstorepresenttheblockpatternintheMB.Themodedecisionisconductedinarate-distortionoptimizationfashion.TheoverheadwillbeinsertedintheMBlayerheaderinthebitstream.SyntaxchangesarelistedinTable 5-1 Table5-1. SyntaxaddedtoMBheader. variablename bits Descriptor ag QT 1 ag QT pattern 4 4bitspattern,1bitforeach8x8 Thismethodmaynotbeoptimal.Theimprovementofencodingoverheadwillbethenextstep. 5.3.1IntraSimilar-blockSearch 13 ].Usingboundarypixelsfromadjacentblocks,thecurrentblockispredictedbasedondesigneddirectionalmethods.Thereare9directionsfor4x4blocksand4directions 94

PAGE 95

B Blockpatterncomparison.(A)HomogeneousblockpatterninH.264.(B)Proposedheterogeneousblockpattern. 95

PAGE 96

Figure5-5. ExampleofsimilarMBs.TwoMBsindicatedbyblockrectanglesareverysimilartoeachother. Analternativewaytoavoidthatistousedatadependantprediction.Infact,thereexistsimilaritiesinsidethesameframeasshowninFigure. 5-5 .Wemaybenetifweusesimilarblockstopredictthecurrentblock.Wecansearchforsimilarblocksintheframe,likemotionestimationamongframes.Blockbasedmotioncompensationisonlyusedtoremovetheredundancyintimedomain.PandBframessearchoverreferenceframestogetamatchedblockasprediction.Onlytheresidualbetweenoriginaldataandprediction 96

PAGE 97

Toencodetheoverhead,hereisasimplemethodweimplemented.First,weextendtheintrapredictionmodewithanadditionalmode.Iftheblockischosenasenhancedpredictionmode,displacementswillbefollowed.Thismethodisnottheoptimaldesignforencodingtheoverhead.TheoverheadwillbeinsertedintheMBlayerheaderinthebitstream.SyntaxchangesarelistedinTable 5-2 : Table5-2. SyntaxaddedtoMBheader. variablename bits Descriptor ag En 1 ag disp hori variable horizontaldisplacement disp vert variable verticaldisplacement 5-3 .ThetestvideosequencescoverfromQCIFtoHD(720p)(Crew,Night,Shuttlestart,City,Bigships)resolutions.TheaveragePSNR 97

PAGE 98

76 ].Theexperimentsettingsarelistedasfollow. Table5-3. Encodingconguration Variable Value Framescoded 30 Prole High QP 22,27,32,37 Entropycoding CABAC RDoptimization enable 5-6 .Thereisanimprovement0.7dBachievedbytheadaptiveblockpattern.Figure. 5-7 showstheperformanceofproposedalgorithmandH.264.Proposedalgorithmhas0.1dBgainoverH.264.TheoverheadleadstothedegradingofgapfromFigure. 5-6 5-8 .Thereisanimprovement0.6dBachievedbyenhancedprediction. Figure. 5-9 showstheperformanceofproposedalgorithmandH.264.Proposedalgorithmhas0.1dBgainoverH.264.TheoverheadleadstothedegradingofgapfromFigure. 5-8 5-10 .Thereis1.3dBPSNRgainoverH.264.Resultsshowthatthesetwoalgorithmsdonot 98

PAGE 99

RDplotsofH.264andproposedalgorithmwithoutoverhead. conictwitheachother.AdaptiveblockpatternandenhancedintrapredictionindeedbothimprovetheRDperformance.Figure. 5-11 showsRDplotswithoverheadsinbitstream.Thecombinationoftwoalgorithmswillintroducetwooverheads.ThatiswhythegainoverH.264issmallerthan0.1dB. 99

PAGE 100

RDplotsofH.264andproposedalgorithmwithoverhead. 100

PAGE 101

RDplotsofH.264andproposedalgorithmwithoutoverhead. 101

PAGE 102

RDplotsofH.264andproposedalgorithmwithoverhead. 102

PAGE 103

RDplotsofH.264andproposedalgorithmwithoutoverhead. 103

PAGE 104

RDplotsofH.264andproposedalgorithmwithoverhead. 104

PAGE 105

77 ][ 78 ][ 79 ][ 80 ]andratecontrolinvideocoding.Itaddressestheproblemofdeterminingminimumsourcesrequirementgivenatolerabledistortion.Itprovidesthetheoreticalboundforpracticalproblems. Figure6-1. Datacompressionsystemdiagram. Inrate-distortiontheory,ratedenotedbyRisdenedasthenumberofbitsusedforperdatasample.Distortionmeasureusuallyusesmeansquareerror(MSE).Adistortionfunctiond(,)mapspairsoforiginalandreconstructedsources(X,^X)toasetofnon-negativerealnumbers. InasystemdescribedbyFigure 6-1 ,wecanobtainasetofpairsof(R,D)bytuningtheparametersofencoder.GivensourceX,arate-distortionpair(R,D)issaidtobeachievableifthereexistsafunctionmappingX!YwithrateRandd(X,^X)D.Theclosureofthesetofachievablerate-distortionpairsiscalledrate-distortionregion.Twofunctionscanbederivedbasedonthedenitionofrate-distortionregion.Oneiscalledrate-distortionfunctionR(D)whichistheinmumofratesRsuchthat(R,D)isintherate-distortionregionofthesourceforagivendistortionD.Theotherisdistortion-ratefunctionD(R)whichistheinmumofdistortionsDsuchthat(R,D)isintherate-distortionregionofthesourceforagivenrateR. Figure 6-2 demonstratestherate-distortionregionforaGaussiansource.Thegrayareadenotestheachievablerate-distortionregion.Nocompressionsystemperformsoutsidethegrayarea.Therate-distortionfunctionatboundaryoftherate-distortion 105

PAGE 106

Figure6-2. Rate-distortionfunctionforaGaussiansource.Achievablerate-distortionregionisthegrayarea. Forquadraticdistortion,onlyaGaussiansourcehasaclosedformrate-distortionfunction.ForothersourceswithLaplacianandgeneralizedGaussiandistributions,therate-distortionfunctionscanbefoundthroughnumericaloptimizationalgorithms[ 81 ].Theratedistortionfunctionspeciestheminimumraterequiredforagiventolerableexpecteddistortion.Thedistortionratefunctionspeciestheminimumdistortionachievableunderagivenratebudget.Thereisalwaysatrade-obetweenthedistortionandtherate.Ifweextendthescenariotomultiplerandomvariables/sources,wecanformulatetheproblemas 106

PAGE 107

VideocompressionproblemcanbeformulatedasEq.( 6{2 ).Dierentsourcesareusuallyframesinthevideosequence.Invideocoding,twotypicalapplications,DVDandIPTV,correspondtothetwoproblemsrespectively.InDVDapplication,theviewersexpectgoodandconsistentvisualqualityacrossframesinthewholemovie.TherateforDVDapplicationsisusuallyat3Mb/sandup.InIPTVapplication,userssharethebandwidthofInternet.Thebandwidthforeachuserissmall,thusvideostreamshavelimitedrates.Theencoderhasstrictconstraintsontherate,whichistypicallyattheorderof300kb/s;otherwise,usersinthesamenetworkswillbeaected.IPTVusuallypresentslargevariationsinqualityacrossframesduetothestrictlimitationinrates. Toaddressthenewproblem,weintroduceconstraintsondistortionuctuations. whereDi=Di+1Di(i=1,,N1),andDisthemaximaltolerableuctuationindistortion. Takingtheconstraintondistortionuctuationsintoaccount,weexploretherelationshipbetweentherateandthechangeofdistortionandderiveaseriesoftheorems,whichcharacterizeanGammaRateTheoryofrate-distortion,i.e.,reducing(increasing,respectively)theratewillresultinincreased(reduced,respectively)distortionuctuation,orviceversa.Thegammaratetheorytellsthatthereisafundamentaltradeobetweentherateandthedistortionuctuation.Wewillalsouseexperimentstovalidatethegammaratetheory. Theremainderofthechapterisorganizedasfollows.Section 6.2 presentsthegammaratetheory.InSection 6.3 ,weapplythegammaratetheorytothedesignofsparsity 107

PAGE 108

6.4 summarizesthischapter. 2log2 Equivalently,wecanobtainthedistortionratefunction DenotefXigNi=1asazero-meanGaussianprocess,withE[X2i]=2i(8i).Tondtherate-distortionfunctionfortheprocess,theproblemcanbeformulatedasbelow Thesolutionto( 6{6 )is[ 80 ] 2log2i whereDi=min(2i,)andsatisesPiDiDmax.( 6{7 )iscalledreversewater-llingsolution. 108

PAGE 109

Reversewater-llingfor7Gaussiansources InFigure 6-3 ,wedemonstratethesolutionEq.( 6{7 )forthecaseof7sources.Thedashedpartsarethedistortionforeachsource.ItcanbeobservedfromFigure 6-3 thatthemaximumdistortionisthevarianceofthesource. Reversewater-llingstrategyistheoptimalsolutionformultiplesources.Reversewater-llingstrategyisnon-causal,i.e.,thecurrentdecisionusesinformationaboutfutureinput.Whendeterminingtheratefortheithframe,thealgorithmhastoprovidewhichguaranteesPiDiDmax.Thenon-causalpropertyisimpossibleforrealtimeapplicationssuchasvideoconference.EvenforapplicationslikeDVD,thisalgorithmwillintroduceextremelyhighcomplexitytogatherinformationfromallframes. Inpracticalsystems,resourceslikebandwidth,delayarealwayseasytodescribethandistortion.Realproblemsareusuallyformulatedas( 6{2 ).Moreover,acausalcontrolstrategyisalwayspreferred.Thegammaratetheoryisintendedtocharacterizethetradeoofrateuctuationanddistortionuctuationunderacausalcontrolstrategy. 109

PAGE 110

6{3 ).Werstdeneafewconceptsasbelow.

PAGE 111

ControllableregioninR{Dplane. 6-4 1)\(R)0. 2)\(R)isadecreasingfunctionforR2[0,R),where 111

PAGE 112

2log2np Qni=12i Proof. 1)From( 6{10 ),itisobviousthat\(R)0. 2)Nowweprovethat\(R)isadecreasingfunctionforR2[0,R),whereR=minR2fR:\(R)=0gR. FortwoarbitrarypointsR1andR2in[0,R).Withoutlossofgenerality,assumeR10andissosmallthatbitscanonlybeusedtoreducejDkj,whiletheresultingj~Dkjunder~R1isstillthemaximumamongfj~Dijgunder~R1.So\(~R1)=j~Dkjsincej~Dkjisthemaximumamongfj~Dijg;andj~Dkjj~Dkj=\(~R1),wehave\(R1)>\(~R1). IncasethereareLsourcesachievingthemaximumvalueofDi,i.e.,kj=argmaxi2f1,,N1gjDij(j=1,,L),welet~R1=R1+,where>0andissosmallthatthereexistfjgLj=1(PLj=1j=)andabitallocationstrategythatguaranteethatjbits(j=1,,L)canonlybeusedtoreducejDkjj,whiletheresultingj~Dkjjunder~R1isstillthemaximumamongj~Dijunder~R1;inotherwords,thesetofsourcesachievingthemaximumvalue,i.e.,I=fi:i=argmaxi2f1,,N1gj~Dijg,containstheLsources(sourcesfkjgLj=1)andthecardinalityjIjL;inotherwords,therecouldbenewsourcesachievingthemaximumvalue.Since\(R1)=jDkjj>j~Dkjj=\(~R1),wehave\(R1)>\(~R1). 112

PAGE 113

SincefortwoarbitrarypointsR1andR2in[0,R)andR1\(R2),hence\(R)isadecreasingfunctionin[0,R). 3)NowweprovethatR=maxn=1,,N1 2log2np 2log2np Firstly,letR=maxn=1,,N1 2log2np 2log2np Qni=12i Then SothereexistsaDsuchthat Therefore,wehave 2log2Qni=12i Thenwehave 2log22i ThenwecanndarateallocationstrategyRwithfRi=1 2log22i 6{15 ).fRi=1 2log22i

PAGE 114

2log2np Secondly,weprovethatR=maxn=1,,N1 2log2np AssumeR
PAGE 115

6{22 )conictswithEq.( 6{17 ),sotheassumptionofR0,wecanndarateallocationstrategyRwithratesfRigNi=1anddistortionsfDigNi=1.WhenchangingfromRtoR,weaddbitstoeachsource.ThatisRi=Ri+fori2f1,,Ng. Thus SincePni=1RinR,wehave SoRiscausalandR-Doptimalrateallocationstrategy. ThedistortionofRare SinceD=Dii2f1,,Ng,wehave Then So 115

PAGE 116

Proof. 6.1 ,\(R)isanon-increasingfunctionin[0,1).Specically,\(R)isadecreasingfunctionforR2[0,R),where\(R)=0;and\(R)=0forR2[R,1). Proof. 6.1 ,\(R)isanon-increasingfunctionin[0,1).So\(R)\(0),R2[0,1).Then\(0)=maxR2[0,1)\(R). Inarealcodec,theachievablenumberofbitsisnotcontinuousbutdiscrete.Theremightbecontrolerrorsbetweenthetargetrateandtherealrateafterencoding.ThenTheorem 6.1 willbemodiedwitherrorsinrate. 1)\(R)0. 2)\(R)isadecreasingfunctionforR2[0,R),whereR=minR2fR:\(R)=0gR Proof. 2log2~2i 116

PAGE 117

ThentheproofreducestotheproofofTheorem 6.1 withmodiedsourcesf~XigNi=1.Aslongastheoriginalsourcef~XigNi=1iscontrollable,Theorem 6.2 iscorrect. Proof. n)n2f1,,Ng(6{31) Whenn!1,B n!0.Thestrategywithbuerreducestoabuerlesscausalstrategyasymptotically.Then\(R)ofthestrategywithbuersstaysthesameasthebuerlessstrategy. Theorem 6.1 providesafundamentaltradeobetweenuserfriendlinessandnetworkfriendliness.UserfriendlinessmeanssmalldistortionuctuationdescribedbythemaximumofjDij;networkfriendlinessmeanssmalldatarate,sincesmalldataratewillconsumesmallbandwidthandyieldsmallcongestioninthenetwork. Theorem 6.1 alsoimpliesthataratecontrolalgorithmwillsuerlargedistortionuctuationiftherateisreduced.Inotherwords,ifaratecontrolalgorithmachieveshigh 117

PAGE 118

Thegammaratefunction\(R)providesthetheoreticalboundsforthemaximumofjDijgivenatargetrateinratecontrolalgorithms.Justliketheratedistortionfunction,thegammaratefunctioncanalsoserveasthebenchmarkforallratecontrolalgorithms. Toevaluatearatecontrolalgorithm,weshouldcombinethegammaratefunctionandtheratedistortionfunction.Inthetriplet(R,D,D),oneofthemshouldbexedtocomparetheothers.Forinstance,thealgorithmwithbetterR-DperformanceisbetterwithxedD.Innextsection,wewillpresentasparsitybasedratecontrolalgorithmthatachievesbetterR-DperformancewithxedDthanH.264. 6.1 bysimulationsinMatlab.ThetestdataisGaussiansourcesfXigNi=1withzeromeansandvariances2i,i=1,...,N.Inthesimulation,wechooseN=100.Theplotof\(R)isshowninFigure 6-5 .The\(R)functionisindeedanon-increasingfunctionin[0,1).WhenRissucientlarge,\(R)=0. 6.3.1RateControlinVideoCoding whereRtisthetargetrateandRiandDidenotetherateanddistortionfortheithframe. Thesolutionto( 6{32 )willbetheoptimalbitallocationamongframes.However,itisdiculttondaclosedformrate-distortionfunctionforapracticalvideocodec.ThereasonisthatthesourceisnotGaussiandistributedandthecodingschemeisnot 118

PAGE 119

optimal.Anotherproblemisthatthenumberofbitsforeachframeincludingbitsforresidualandbitsformotionvectorsandsideinformation(e.g.mode,type,etc),whicharediculttomodel.TheclosedformsolutioncannotbedeterminedasEq.( 6{7 ).Inliterature,variousmodelsareintroducedtoapproximatetheratedistortionfunctionofapracticalvideocodingsystem.Forexample,domainmodel[ 82 ],[ 83 ],linearmodel[ 84 ]andquadraticmodel[ 85 ]areproposedtomodelthefunctionsbetweencodingparametersandrateanddistortion.Modelparametersareestimatedfromvideodata.Theaccuracyofparameterestimationdependsontheinformationgathered.Therearetwomajortypesofstrategies.Onetypeistwopassormultiplepass,whichisanon-causalsolutionto( 6{32 ).Thevideosequenceisrstencodedtocollectstatisticalinformation,whichareusedtoestimatemodelparameters.Codingparametersforthewholesequencearedeterminedbysolving( 6{32 )usingtheestimatedmodel.Severalpassesmaybetakentoreachastableestimation.Inthelastpass,thevideosequenceisre-encodedusingcodingparametersdeterminedbyoptimalsolutionto( 6{32 ).Theproblemwiththisapproachis 119

PAGE 120

6{32 ).Thevideosequenceisencodedonlyonce.Allthemodelparametersareestimatedbasedonup-to-datedata.Thecodingparametersaredeterminedframebyframe.Forapplicationswithconstraintsondelayandcomplexity,single-passstrategyispreferred.Single-passstrategyhaspoorerperformancethantwo-passormultiplepassduetothelackofpriorknowledgeforfutureframes.Ifwecanhaveadditionalinformationaboutfutureframes,theperformanceisexpectedtobeimproved. Forsingle-passratecontrol,weproposedasparsitybasedalgorithmtoimprovetheperformance.InH.264,aMBissaidtobeSKIPmodeifalltransformcoecientsarequantizedtozeros.DenoteSn=[s1,...,sM]T,wheresi=0,iftheithMBhasSKIPmode;otherwise,si=1.LetT=jSj1 82 ].ThedierenceisthatTdenoteszeroratioinMBlevelandacrossframes,anddenoteszeroratioinpixellevelandinsideoneframe.Framesinavideosequencehavehightemporalredundancies.ThenwendthatvectorSnissparse,especiallywhenthebitrateislow.Inotherwords,twoadjacentframesareverysimilar.ThesimilaritycanbedescribedbyT.ItisreasonabletosaythattwoframeswithalargeTaremuchmoresimilarthanthosewithasmallT.Thesimilaritycanhelptomodeltheratedistortionfunctionoftherealcodecaccurately.Bysparsitybasedratecontrol,wemeanthatinthisalgorithm,welevaragethehighpercentageofMBswithSKIPmode.Withaccuratemodels,wecanndapracticalclosedformsolutionto( 6{32 ). Oncethebitallocationisdeterminedbysolving( 6{32 ),thenextstepistomapthetargetratetocodingparameters(i.e.QuantizationParameter(QP)inH.264).TheH.264standardhasintroducedrate-distortionoptimized(RDO)motioncompensationwithmultiplereferenceframesandmodedecisionamongvariousintra-andinter-predictionmodes.Thesenewfeatureshaveimprovedthecodingeciencysignicantly.However,RDOmaketheratecontrolmorecomplicated.Forinstance,ratecontrolalgorithmneedspredictionresidualinformationtodetermineQP.However,predictionresidualcannotbe 120

PAGE 121

86 ],whichmakestheratecontrolinH.264morechallengingthanpreviousstandard.Severalratecontrolalgorithmsareproposedtosolvetheproblem[ 87 ][ 88 ][ 89 ][ 90 ][ 91 ][ 92 ]. 6{32 )basedondierentrate-distortionmodels.Thentheconstraintondistortionuctuationsisintroducedandanewoptimalsolutionisderivedforthenewconstraintproblem. 6{32 )isreducedto whereD(R)istherate-distortionfunction. WecanseparatetheframeintoSKIPregionandnon-SKIPregion.Asmentioned,SKIPmodewillnotconsumeanybitsfromthebudget.Soonlyaportionofbudgetwillencoderesidualdata.Thenthedistortionofthenthframeis whereTnisthepercentageofSKIPMBs,DSnistheaveragedistortionofSKIPregionandDNnistheaveragedistortionofnon-SKIPregion.ThereconstructedpixelvaluesinSKIPregionsareexactlythesameaspreviousframe.Consideringthehighsimilarityofadjacentframes,weassumethatDSn+1DnandDSnisindependentof.Thenthegoalis 121

PAGE 122

Aclosedformsolutionwillbeavailableifweknewtheclosedformofrate-distortionfunctionin( 6{35 ).Thealgorithmcanbeeasilyextendedtomultipleframesbyassumingallfutureframesafter(n+1)thframeconsumethesamerateasthe(n+1)thframe,sincetheevendistributionisasuboptimalapproachwithoutpriors.Dierentrate-distortionmodelwillyielddierentsolutionfor( 6{35 ).Intypicalvideocodingsystems,predictionresidualsareusuallymodeledasGaussian,Laplacian[ 93 ]andCauchydistributions[ 94 ].Wewillderivesolutionsbasedonvariousrate-distortionmodels. IfthesourcehasGaussiandistribution,wehaverate-distortionfunctionas where2isthevarianceofthesourceandDisdenedasmeansquareerror(MSE).Forthenthand(n+1)thframes,allthebitsareusedfortheencodingofnon-SKIPregions.Sowehave where2nand2n+1arethevariancesofthenon-SKIPregionofthenthand(n+1)thframe,respectively.ThenEq.( 6{35 )is 122

PAGE 123

2+1 4Rt[log2((1Tn)(1+Tn+1) 1Tn+1)+log2(2n IfthesourcehasLaplaciandistributionwithpdfp(x)= 2ejxj,wehaverate-distortionfunctionas[ 79 ][ 95 ] whereistheLaplacianparameterandDisdenedasmeanabsolutedierence(MAD).canbeestimatedby^=1 2(6{42) Sowehavethedistortionsfortwoframes wherenandn+1aretheLaplacianparametersofthenon-SKIPregionofthenthand(n+1)thframe,respectively. AlthoughMSEisusedfordistortionmeasure,itisreasonabletoapproximateMSEbyMAD.Thentheproblemcanbesolvedby Set@D=@=0,wecanndtheoptimalsolution 2+1 2Rt[log((1Tn)(1+Tn+1) 1Tn+1)log(n 123

PAGE 124

94 ][ 96 ],CauchydensityisusedtoapproximatetheDCTcoecientdistributionofH.264.AzeromeanCauchydistributionhasthepdf (2+x2),x2R(6{47) whereistheparameterofCauchydistribution.TheentropyfunctionasafunctionofquantizationstepsizeQforaCauchydistributionis[ 94 ] ThedistortionfunctionofaCauchydistributionis Therate-distortionofCauchysourcecanbederivedbasedonEqs.( 6{48 )and( 6{49 ).However,theformulawillbeverycomplicatedandnotsuitableforpracticalusage.Therate-distortionmodelisapproximatedbyasimplerformula whereaandbareparametersestimatedfromvideodata.Theseparameterscanbeestimatedinframelevelandthismodelcanprovidemoreaccurateestimationfordistortion.WhenwetakealogarithmonbothsidesofEq.( 6{50 ),alinearrelationshipcanbeestablished 124

PAGE 125

6-6 showsthatthelinearrelationshipholdsforalmostallframesinvariousvideosequences. Toevaluatetheaccuracyofdierentmodels,weusetheestimationerrorsinrootofmeansquarederrorRMSEandR2.TheR2isaquantityusedtomeasurethedegreeofdatavariationfromagivenmodel.Itisdenedas whereXiand^Xiaretheactualandestimatedvaluesofdatapointi,respectively,andXisthemeanofalldatapoints.TheresultsofRMSEandR2arelistedinTable 6-1 .It'sobviousthatb<0sincelargeratewillreducethedistortion. Table6-1. ModelaccuracyforRD RMSE 0.0625 0.9976 foreman 0.0574 0.9967 mobile 0.0354 0.9994 paris 0.0506 0.9983 Inthistwo-framecase,wecanassumethataandbarethesameforbothframes.UsingCauchyratedistortionfunction,Eq.( 6{35 )isexpendedas Set@D=@=0,theoptimalsolutionis 1+b1r 1Tn+1(6{54) 125

PAGE 126

TherelationbetweenRandD Asimplemethodistousetheheadbitsfrompreviousframeasapredictor.Thenthetargetrateforresidualistheremainderofthetotalbudget.TheQPwillbecalculatedby Q+x2MAD Q2(6{55) wherex1andx2aremodelparameters,MADisthemeanabsolutedierentbetweenoriginalsignalanditspredictionandQisthequantizationstepsize.MADispredictedby 126

PAGE 127

whereMADn1istherealMADof(n1)thframeand1and2aremodelparameters.WhenthecurrentframedisencodedbytheQPcalculatedusingEq.( 6{55 ).AllmodelparametersinEq.( 6{55 )and( 6{56 )willbeestimatedandstoredforfurtherusage. whereTn1istherealTof(n1)thframeand1and2aremodelparameters,whichareupdatedaftereachframeisprocessed. Inthedevelopmentofthealgorithm,wemakeassumptionsabouttheapproximationsofdistortionofSKIPandnon-SKIPareas.Theseassumptionsmaynotbevalidwhenthereisascenechangeinthevideosequence.Sowehavetodetectthesesuddenchanges.Usually,scenechangewillleadtothedecreasingofT.WecanestimateTnbyassumingzeromotioninthenthframe.Inotherwords,thesimpleframedierenceisusedaspredictionresidual.Additionaltransformsandquantizationareintroducedtoobtainanotherestimate~Tn.Then~Tnand^Tnarecomparedlike[ 97 ].If^Tn ThenewsparsitybasedratecontrolisimplementedinJM15.1[ 75 ].Thewholealgorithmisdescribedasfollows. 1. Initialbitallocationforaframe. AllocateacertainbitbudgettothecurrentframeRtaccordingtotheapproachinJVT-G012[ 98 ]. 2. Calculatetargetrate. 127

PAGE 128

3. MaptoQP Accordingtothebudgetofcurrentframe,determinetheQPforencodingusingR-QmodelinEq.( 6{55 ). 4. Modelupdate. Updatethemodelparametersusedinthealgorithmbasedonthedataincurrentframeandpreviousframes.Themodelparametersareestimatedbytheleastsquareapproximation. ExperimentSettings Toevaluatetheperformanceofproposedalgorithm,weusearealvideocodec.Thestate-of-the-artvideocodingstandardH.264isused.TheencodingowchartisdemonstratedinFigure 6-7 .Theproposedalgorithmispartofthe"codercontrol"moduleonthetopofthediagram. Figure6-7. EncodingprocesswithratecontrolinH.264. 128

PAGE 129

98 ]algorithmforratecontrol.Theratecontroloperatesatframelevel.BothalgorithmsusethesameR-Qmodel.Theonlydierenceisthatproposedalgorithmusesadaptivebitallocationinsteadofaveragedistribution. Basedonthegammaratetheory,toevaluatearatecontrolalgorithm,weneedtousethetriplet(R,D,D)toquantifytheperformance.SinceproposedalgorithminthissectionandtheratecontrolalgorithminH.264cannotcontrolthedistortionuctuation,wecomparetheR-DperformanceofproposedalgorithmandtheratecontrolalgorithminH.264.Atthesametime,welistthegainsofdistortionuctuationsbetweentwoalgorithms.IfproposedalgorithmimprovestheR-DperformanceoverH.264,thatmeansitneedssmallerratetoachievethesamePSNRasH.264does.Accordingtothegammaratetheory,largerdistortionuctuationsareexpected. InFigure 6-8 ,wepresentratedistortioncurvesofproposedalgorithm.WecomparedproposedalgorithmbasedonGaussian,LaplacianandCauchymodelswithH.264.ResultsshowthatproposedalgorithmimprovetheaveragePSNRoverH.264foralltestedvideosequences.Detailedexperimentalresultsareavailableinfollowingsections.TheaverageratedierencebetweenproposedalgorithmandJVT-G012inH.264iscalculatedas[ 76 ].Theaveragedierenceisthepercentageofratechangew.r.ttotheratesinJVT-G012forgivenPSNR.WealsolisttheaveragedierenceofDbetweenproposedalgorithmandJVT-G012inH.264. Inthisexperiment,weuseGaussianratedistortionfunctiontoobtaintheoptimalbitallocation.TheexperimentalresultsarelistedinTable 6-2 .Thereare4.98%ratesaving 129

PAGE 130

RatedistortioncomparisonbetweenH.264andproposedalgorithmwithdierentmodels. onaverageoverH.264.Wecanobservethatforalltestedsequencestheratesdecreaseanddistortionuctuationsincrease.Itmeansproposedalgorithmsavesmorebitsinthecostofintroducingmoredistortionuctuations.Sothereisatradeobetweenrateanddistortionuctuation. ExperimentalresultsusingLaplacianmodelarelistedinTable 6-3 .Thereareonaverage5.03%ratesavingcomparedtoH.264.Forsequence`hallmonitor',theaverageratesavingis12%.Whileachievingratesaving,theproposedalgorithmsuerslargedistortionuctuations.InTable 6-3 ,alltheDdierencesaregreaterthan0,whichmeans 130

PAGE 131

PerformancecomparisonofH.264andproposedalgorithmusingGaussianmodel Sequence R(kbps) JVT-G012 Proposed average rate PSNR rate PSNR ratedierence (dB) hall monitor 64 64.27 34.45 64.62 34.73 -11.9% 0.48 128 130.37 36.47 129.41 36.79 256 256.69 38.11 257.3 38.54 384 384.48 39.11 384.45 39.29 waterfall 64 67.01 28.34 66.91 28.26 -1.3% 0.16 128 129.15 31.87 129.77 31.56 256 256.47 34.75 257.33 35.46 384 383.32 36.27 385.8 36.96 bus 64 82.61 22.84 82.23 22.82 -0.7% 0.15 128 131.79 25.1 136.73 25.21 256 256.04 28.04 257.45 28.2 384 384.04 29.82 386.37 29.99 news 64 64.33 32.72 66.76 32.59 -4.1% 0.07 128 128.12 35.54 128.89 36.05 256 256.19 39.54 256.99 39.64 384 384.48 41.55 383.98 41.58 paris 64 66.36 26.57 67.7 26.42 -6.8% 0.29 128 128.16 28.91 129.49 29.19 256 256.33 31.7 258.48 32.25 384 383.84 33.55 385.43 34.52 proposedalgorithmintroducesmoredistortionuctuations.ProposedalgorithmtradeslargedistortionuctuationforsmallratewhenachievingthesamePSNRasH.264. InTable 6-4 ,resultsfromproposedalgorithmbasedonCauchymodelachieve5.1%ratesavingoverH.264onaverage.TheperformanceisbetterthanLaplacianandGaussianmodel.SoCauchymodelisausefultooltodescribetherate-distortionfunction.InTable 6-4 ,wealsoobservethatproposedalgorithmtradeslargedistortionuctuationsforsmallrates. Experimentalresultsofsparsitybasedratecontrolalgorithmwithoutconstraintsondistortionuctuationsshowthatthereisatradeobetweenrateanddistortionuctuations,whichvalidatesthegammaratetheory. 131

PAGE 132

PerformancecomparisonofH.264andproposedalgorithmusingLaplacianmodel Sequence R(kbps) JVT-G012 Proposed average rate PSNR rate PSNR ratedierence (dB) hall monitor 64 64.27 34.45 64.62 34.73 -12% 0.48 128 130.37 36.47 129.41 36.79 256 256.69 38.11 257.3 38.54 384 384.48 39.11 385.32 39.3 waterfall 64 67.01 28.34 66.91 28.26 -1.4% 0.16 128 129.15 31.87 129.61 31.57 256 256.47 34.75 256.97 35.46 384 383.32 36.27 386.13 37.07 bus 64 82.61 22.84 82.61 22.84 -0.8% 0.23 128 131.79 25.1 135.43 25.19 256 256.04 28.04 257.68 28.18 384 384.04 29.82 388.84 30.02 news 64 64.33 32.72 66.76 32.59 -4.1% 0.07 128 128.12 35.54 128.89 36.05 256 256.19 39.54 256.99 39.64 384 384.48 41.55 383.98 41.58 paris 64 66.36 26.57 67.7 26.42 -6.8% 0.29 128 128.16 28.91 129.49 29.19 256 256.33 31.7 258.48 32.25 384 383.84 33.55 385.43 34.52 6.3.3.1Algorithmframework whereDisagiventhresholdforviewerquality.TheproblemissolvedbyLagrangianmultiplier,sowehave 132

PAGE 133

PerformancecomparisonofH.264andproposedalgorithmusingCauchymodel Sequence R(kbps) JVT-G012 Proposed average rate PSNR rate PSNR ratedierence (dB) hall monitor 64 64.27 34.45 64.62 34.73 -12% 0.48 128 130.37 36.47 129.41 36.79 256 256.69 38.11 257.3 38.54 384 384.48 39.11 385.32 39.3 waterfall 64 67.01 28.34 66.91 28.26 -1.7% 0.13 128 129.15 31.87 129.61 31.57 256 256.47 34.75 257.56 35.51 384 383.32 36.27 385.97 37.01 bus 64 82.61 22.84 82.61 22.84 -0.8% 0.36 128 131.79 25.1 135.43 25.19 256 256.04 28.04 259.29 28.2 384 384.04 29.82 386.65 30.08 news 64 64.33 32.72 66.76 32.59 -4.1% 0.07 128 128.12 35.54 128.89 36.05 256 256.19 39.54 256.99 39.64 384 384.48 41.55 383.98 41.58 paris 64 66.36 26.57 67.7 26.42 -6.8% 0.29 128 128.16 28.91 129.49 29.19 256 256.33 31.7 258.48 32.25 384 383.84 33.55 385.43 34.52 whereisaLagrangianmultiplierandalsoafunctionofD. TakingtheSKIPregionintoaccount,wecanwritethedistortionas WiththeassumptionthatDSn+1Dn,Eq.( 6{59 )canbewrittenas Thesolutioncanbefoundwhen 133

PAGE 134

IfthesourcehasGaussiandistribution,wecanexpandEq.( 6{61 )as ThesolutionofEq.( 6{64 )canbefoundbysetting@F BecauseDisjustabound,theoptimalsolutionsatises^.Tond^,wehavetoreachtheonewithminimalcost.Thesearchrangeis[0.5,^].@F=@=0yields Wecangetscorrespondingto2[0.5,]usingEq.( 6{65 ). Oncewendtheoptimal,theoptimal^is 2+1 4Rt[log2((1Tn)[(1+Tn+1)(1Tn+1)] (1Tn+1)(1))2log2(n+1 However,anobservationisthatunderthismodel,Dincreasesalong.Butourgoalistominimizetotaldistortion.Weknewtheunconstraintsolution. 2+1 4Rt[log2((1Tn)(1+Tn+1) 1Tn+1)+log2(2n Theconstraintsondistortionuctuationactuallyintroduceaconstraintontheregionof.Thecorrespondingboundis 2Rtlog2C+p whereC=DTnDSn(1Tn+1) 1Tn+1. 134

PAGE 135

IftheconstraintisgivenindB,wehave Letr=10D=10,wehave Thecorrespondingboundofregionof +=1 2Rtlog2C+q whereC=TnDSn(1rTn+1). Sothenaloptimalsolutionis SincethedistortionofLaplacianmodelisdenedasMAD,itisnotstraightforwardtoderivethesolutiongiventheconstraintsonuctuationofdistortions. WithCauchyapproximationmodel,theconstraintondistortionuctuationis Theboundarypoint+correspondingtotheconstraintwillbe aRbt(6{75) 135

PAGE 136

1Tn+1. IftheconstraintDisgivenindB,letr=10D=10.Thenwehave whereC=TnDSn(1rTn+1) 75 ].Theimplementationissimilartoalgorithmwithoutconstraintsondistortionuctuation.TheonlydierenceisthatweaddtheuctuationcheckafterweencodethecurrentframebyestimatedQP.Thischeckwillverifytheconstraintsondistortionuctuation.Ifthecheckfails,itwillre-encodetheframewithmodiedQPuntilthecheckismet.Sothisalgorithmwillintroducecomplexityduetore-encodingframes. Thewholealgorithmisdescribedasfollows. 1. Initialbitallocationforaframe. AllocateacertainbitbudgettothecurrentframeRtaccordingtotheapproachinJVT-G012[ 98 ]. 2. Calculatetargetrate. Usingthemethodtondtheoptimalsolution^.Ifallconditionsaremet,modifythetargetrateofcurrentframetoR0t=M^Rt,whereMisthenumberofframesconsideredinthebitallocationalgorithm.Otherwise,leavethetargetrateunchanged. 3. MaptoQP Accordingtothebudgetofcurrentframe,determinetheQPforencodingusingR-QmodelinEq.( 6{55 ). 136

PAGE 137

Fluctuationcheck. Checkthedierenceofdistortionbetweencurrentframeandpreviousframes.Ifthedistortionuctuationconstraintisviolated,increaseordecreasecurrentQPandre-encodethecurrentframeuntiltheconstraintismet. 5. Modelupdate. Updatethemodelparametersusedinthealgorithmbasedonthedataincurrentframeandpreviousframes.Themodelparametersareestimatedbytheleastsquareapproximation. Tomakeafaircomparison,wemodiedtheratecontrolalgorithminH.264byaddingthesameuctuationcheckstepattheendofframeencoding.WereferthisnewalgorithmforH.264asmodiedH.264intherestofthischapter. ExperimentSettings Toevaluatetheperformanceofproposedalgorithm,weusearealvideocodec.Thestate-of-the-artvideocodingstandardH.264isused.TheencodingowchartisdemonstratedinFigure 6-7 .Theproposedalgorithmispartofthe"codercontrol"moduleonthetopofthediagram.Inourexperiments,allframesareencodedasP-framesexceptfortherstI-frame.Thereare150framesforevaluation.SeveralcommontestvideosequencesinCIFtypearetested.Allalgorithmaretestedwithfourtargetrates(i.e.64,128,256,384kbps).Theratecontrolalgorithmareoperatingatframelevel.Inourexperiments,wechoosetheconstraintchangesfrom0.4dBto1.4dBwithstepsize0.1dB. Basedonthegammaratetheory,toevaluatearatecontrolalgorithm,weneedtousethetriplet(R,D,D)toquantifytheperformance.Inparticular,wecomparetheR-DperformancesofratecontrolalgorithmswiththesameD. Withthesamedistortionuctuation,i.e.D,proposedalgorithmachievesbetterR-DperformancethanmodiedH.264fordierenttargetbitratesanddierentsequences.Figure 6-9 showstheR-DperformancesofproposedalgorithmandmodiedH.264withD=0.8dBforvarioustestvideosequences.ItcanbeobservedthatproposedalgorithmoutperformsmodiedH.264inR-Dperformance. 137

PAGE 138

B`waterfall' C`news' RatedistortioncomparisonswithD=0.8dB. 138

PAGE 139

6-10 and 6-11 ,bothalgorithmcanusesmallrateandlargedistortionuctuationorlargerateandsmalldistortionuctuationtoachievethesamePSNR.Inotherwords,toachievethesamedistortion,atradeowillbemadebetweenrateanddistortionuctuation.Thisispredictedbythegammaratetheory. B38.59dB 139

PAGE 140

B38.62dB 140

PAGE 141

141

PAGE 142

Thisdissertationexploredalgorithmsinimageandvideoprocessingbasedonsparsity.Twodierentapproacheswereadopted:dataindependentanddatadependentapproaches.ThebackgroundandrelatedworkwereintroducedinChapter 1 Chapter 2 proposedanewtransformcalledRipplettransformtypeIbygeneralizingCurvelettransformtorepresentcurvesmoreecientlyinimages.Weintroducedsupportcanddegreedinadditiontoscale,locationanddirectionparametersinthedenition,whilethecurvelettransformisjustaspecialcaseoftheripplettransformtypeIwithc=1andd=2.Theexibilityenabledthecapabilityofyieldingmoreecientrepresentationsforimageswithsingularities.Ripplet-Ifunctionsformanewtightframeforthefunctionalspace.Therippletshavegoodlocalizationinbothspatialandfrequencydomain.Inparticular,wedevelopedforwardandbackwardRipplettransformtypeIforbothcontinuousanddiscretecases.Thehighlydirectionalrippletshavegeneralscalingwitharbitrarydegreeandsupport,whichcancapture2Dsingularitiesalongdierentcurvesinanydirections.Toevaluatetheperformanceofproposedtransform,RipplettransformtypeIwasappliedtoimagecompressionandimagedenoising.Experimentalresultsindicatedthatripplet-Itransformcanprovidemoreecientrepresentationsofimageswithsingularitiesalongsmoothcurves.Usingafewcoecients,ripplet-IcanoutperformDCTandwavelettransforminnonlinearapproximation.Itispromisingtocombineripplet-ItransformandothertransformssuchasDCTtorepresenttheentireimage,whichcontainsobjectboundariesandtextures.Ripplet-Itransformisusedtorepresentthestructureandtexture,whileDCTisusedtocompresssmoothparts.Thesparserepresentationofripplet-Itransformalsodemonstratedpotentialsuccessinimagedenoising. 142

PAGE 143

3 ,weintroducedRipplettransformtypeIIbasedongeneralizedRadontransformforresolving2Dsingularities.Thenewtransformconverts2Dsingularitiesinto1DsingularitiesthroughgeneralizedRadontransform.Then1Dwavelettransformisusedtoresolvethe1Dsingularities.Bothforwardandinverseripplet-IItransformweredevelopedforcontinuousanddiscretecases.Ripplet-IItransformwithd=2canachievesparserrepresentationfor2Dimages,comparedtoridgelet.Somepropertiesofthenewtransformareexplored.Ripplet-IItransformalsoenjoysrotationinvariantproperty,whichcanbeleveragedbyapplicationssuchastextureclassicationandimageretrieval.Experimentsintextureclassicationandimageretrievaldemonstratedthattheripplet-IItransformbasedschemeoutperformswaveletandridgelettransformbasedapproaches. Chapter 4 presenteddatadependentapproaches.Inparticular,weproposedageneralframeworkthatexploresselfsimilaritiesinsidesignalstoachievesparsity.Theframeworkuniedthreesparsitybaseddenoisingtechniquesandappliedthemforvideocompressionartifactremovalproblem.Wecomparedthede-artifactingperformanceofthealgorithmsconsideringthreeaspects:patchdimension,transformdimension,andquantizationparameter.Duringthecomparison,simulationswereperformedtosupportouranalysis,whichmayprovideguidelinesforapplyingsimilardenoisingalgorithmstovideocompressionworkinthefuture. InChapter 5 ,weproposedseveraltechniquestoimprovetheperformanceofvideocodingthroughsparsityenhancementofpredictionresiduals.TocapturethestructureofcontentsinaMB,heterogeneousblockpatternwasintroducedtotthecontentadaptively.Inadditiontomodelbasedintraprediction,weproposedtodirectlysearchforthebestpredictioninpre-encodedimageregions.Thesetechniquescanimprovethevideocodingperformancebyimprovingthesparsityofpredictionresidualdata.ThealgorithmswereimplementedbasedonthehybridvideocodingframeworkinH.264.ExperimentalresultsdemonstratedgainsoverH.264byproposedalgorithms. 143

PAGE 144

6 ,weexploredtherate-distortionbehaviorsafterintroducingnewconstraintsonthesmoothnessofdistortion.Uncertaintyprinciplewasproposedtoprovideguideforratecontrolalgorithmdesigning.Thereisatradeobetweenthedistortionuctuationandthetargetrate.Asparsitybasedratecontrolalgorithmwasproposedtoimprovethecodingeciencywithlimitationonbitrate.Rate-distortionmodels:Gaussian,LaplacianandCauchymodels,wereemployedtoderivetheoptimalbitallocationforcaseswithandwithoutconstraintsondistortionsmoothness.Theproposedalgorithmwasevaluatedinthe(R,D,D)fashionbasedonthegammaratetheory.Experimentalresultsdemonstratedthatthegammaratetheoryisvalidforpracticalratecontrolschemes. 6 ,weproposedasparsitybasedratecontrolalgorithm.Theratecontrolalgorithmworksforbitallocationamongmultipleframes.Thereareseveraldirectionstoextendthealgorithm. 83 ].TodistinguishwithTusedinChapter 6 ,weuseShere.ScarriesthesparsityinformationinspatialdomainandTdescribesthesparsityinformationintemporaldomain.Wecanderiveratecontrolalgorithmsbyexploringsparsityinbothspatialandtemporaldomains.AjointschemeofSandTwillbedesigned.ByjointlytuningSandT,wecanexpectsuperiorperformanceoverexistingapproaches. 144

PAGE 145

[1] J.Tropp,Topicsinsparseapproximation,Ph.D.dissertation,UniversityofTexasatAustin,2004. [2] D.L.Donoho,M.Vetterli,R.A.Devore,andI.Daubechies,\Datacompressionandharmonicanalysis,"IEEETransactionsonInformationTheory,vol.44,no.6,pp.2435{2476,October1998. [3] A.Oppenheim,A.Willsky,andI.Young,Signalsandsystems,PrenticeHallEnglewoodClis,NJ,1983. [4] H.KramerandM.Mathews,\Alinearcodingfortransmittingasetofcorrelatedsignals,"IRETransactionsonInformationTheory,vol.2,no.3,pp.41{46,1956. [5] S.Watanabe,\Karhunen{Loeveexpansionandfactoranalysis:theoreticalremarksandapplications,"PatternRecognition:IntroductionandFoundations,pp.635{660,1973. [6] N.Ahmed,T.Natarajan,andK.Rao,\Discretecosinetransform,"IEEETransac-tionsonComputers,vol.100,no.23,pp.90{93,1974. [7] G.Wallace,\TheJPEGstillpicturecompressionstandard,"IEEETransactionsonConsumerElectronics,vol.38,no.1,1992. [8] C.Burrus,R.Gopinath,andH.Guo,Introductiontowaveletsandwavelettrans-forms:aprimer,PrenticeHall,1998. [9] M.VetterliandC.Herley,\Waveletsandlterbanks:Theoryanddesign,"IEEETransactionsonSignalProcessing,vol.40,no.9,pp.2207{2232,September1992. [10] M.VetterliandJ.Kovacevic,Waveletandsubbandcoding,Prentice-Hall,EnglewoodClis,NJ,1995. [11] I.Daubechies,Tenlecturesonwavelets,SIAM,Philadelphia,PA,1992. [12] S.Mallat,Awavelettourofsignalprocessing,Academic,NewYork,2ndedition,1999. [13] C.Christopoulos,A.Skodras,andT.Ebrahimi,\TheJPEG2000stillimagecodingsystem:anoverview,"IEEETransactionsonConsumerElectronics,vol.46,no.4,pp.1103{1127,2000. [14] R.Bracewell,\Thefouriertransform,"ScienticAmerican,vol.260,no.6,pp.86{95,1989. [15] E.Brigham,ThefastFouriertransformanditsapplications,PrenticeHallEnglewoodClis,NJ,1988. 145

PAGE 146

J.FosterandF.Richards,\TheGibbsphenomenonforpiecewise-linearapproximation,"TheAmericanMathematicalMonthly,vol.98,no.1,pp.47{49,1991. [17] A.Jerri,TheGibbsphenomenoninFourieranalysis,splines,andwaveletapproxima-tions,Springer,1998. [18] I.Daubechies,\Thewavelettransform,time-frequencylocalizationandsignalanalysis,"IEEETransactionsonInformationTheory,vol.36,no.5,pp.961{1005,1990. [19] E.J.CandesandD.L.Donoho,\Ridgelets:akeytohigherdimensionalintermittency?,"PhilosophicalTransactions:Mathematical,PhysicalandEngineeringSciences,pp.2459{2509,1999. [20] E.J.Candes,Ridgelet:TheoryandApplication,Ph.D.dissertation,StanfordUniversity,1998. [21] E.J.CandesandD.L.Donoho,\Newtightframesofcurveletsandoptimalrepresentationsofobjectswithpiecewisec2singularities,"CommunicationsonPureandAppliedMathematics,vol.57,no.2,pp.219{266,February2003. [22] E.J.Candes,L.Demanet,D.L.Donoho,andL.Ying,\Fastdiscretecurvelettransforms,"MultiscaleModelingandSimulation,vol.5,pp.861{899,2005. [23] G.Easley,D.Labate,andW.Lim,\Sparsedirectionalimagerepresentationsusingthediscreteshearlettransform,"AppliedandComputationalHarmonicAnalysis,vol.25,no.1,pp.25{46,2008. [24] D.Labate,W.Lim,G.Kutyniok,andG.Weiss,\Sparsemultidimensionalrepresentationusingshearlets,"inSPIEProceedings5914,SPIE,Bellingham,WA,2005,pp.254{262. [25] B.AtalandS.Hanauer,\Speechanalysisandsynthesisbylinearpredictionofthespeechwave,"JournaloftheAcousticalSocietyofAmerica,vol.50,no.2,pp.637{655,1971. [26] H.Strube,\Linearpredictiononawarpedfrequencyscale,"TheJournaloftheAcousticalSocietyofAmerica,vol.68,pp.1071,1980. [27] A.HarmaandU.Laine,\Acomparisonofwarpedandconventionallinearpredictivecoding,"IEEETransactionsonSpeechandAudioProcessing,vol.9,no.5,pp.579{588,2001. [28] J.Gibson,Digitalcompressionformultimedia:principlesandstandards,MorganKaufmann,1998. [29] K.PohlmannandK.Pohlman,Principlesofdigitalaudio,McGraw-Hill,2005. 146

PAGE 147

G.Box,G.Jenkins,andG.Reinsel,Timeseriesanalysis:forecastingandcontrol,Holden-daySanFrancisco,1976. [31] O.Guleryuz,\Weightedovercompletedenoising,"inConferenceRecordoftheThirty-SeventhAsilomarConferenceonSignals,SystemsandComputers,2003,vol.2. [32] Y.NakayaandH.Harashima,\Motioncompensationbasedonspatialtransformations,"IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.4,no.3,pp.339{356,1994. [33] T.Wedi,\MotioncompensationinH.264/AVC,"IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.13,no.7,pp.577{586,2003. [34] K.Dabov,A.Foi,V.Katkovnik,andK.Egiazarian,\ImageDenoisingbySparse3-DTransform-DomainCollaborativeFiltering,"IEEETransactionsonImageProcessing,vol.16,no.8,pp.2080,2007. [35] K.Dabov,A.Foi,V.Katkovnik,andK.Egiazarian,\VideoDenoisingbySparse3-DTransform-DomainCollaborativeFiltering,"inProceedingsof15thEuropeanSignalProcessingConference,Sep.2007. [36] X.LiandY.Zheng,\Patch-basedvideoprocessing:avariationalBayesianapproach,"IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.19,no.1,pp.27{40,2009. [37] M.DoandM.Vetterli,\Theniteridgelettransformforimagerepresentation,"IEEETransactionsonImageProcessing,vol.12,no.1,pp.16{28,January2003. [38] S.R.Deans,TheRadontransformandsomeofitsapplications,JohnWiley&Sons,NewYork,1983. [39] J.L.Starck,E.J.Candes,andD.L.Donoho,\Thecurvelettransformforimagedenoising,"IEEETransactionsonImageProcessing,vol.11,pp.670{684,June2002. [40] D.L.DonohoandM.R.Duncan,\Digitalcurvelettransform:strategy,implementationandexperiments,"inProceedingsofAerosense2000,WaveletApplicationsVII.SPIE,2000,vol.4056,pp.12{29. [41] E.J.CandesandD.L.Donoho,\Continuouscurvelettransform:I.Resolutionofthewavefrontset,"AppliedandComputationalHarmonicAnalysis,vol.19,no.2,pp.162{197,2005. [42] E.J.CandesandD.L.Donoho,\Continuouscurvelettransform:II.Discretizationandframes,"AppliedandComputationalHarmonicAnalysis,vol.19,pp.198{222,2005. [43] L.Hormander,Theanalysisoflinearpartialdierentialoperators,Springer-Verlag,Berlin,2003. 147

PAGE 148

M.N.DoandM.Vetterli,\Thecontourlettransform:anecientdirectionalmultiresolutionimagerepresentation,"IEEETransactionsonImageProcessing,vol.14,no.12,pp.2091{2106,December2005. [45] M.N.DoandM.Vetterli,\Contourlets,"inBeyondwavelets,G.V.Welland,Ed.AcademicPress,NewYork,2003. [46] E.LePennecandS.Mallat,\Sparsegeometricimagerepresentationswithbandelets,"IEEETransactionsonImageProcessing,vol.14,no.4,pp.423{438,2005. [47] A.Cohle,I.Daubechies,andJ.Feauveau,\Biothogonalbaseofcompactlysupportedwavelets,"CommunicationsonPureandAppliedMathematics,vol.4,pp.45{47,1992. [48] D.Taubman,\HighperformancescalableimagecompressionwithEBCOT,"IEEETransactionsonImageProcessing,vol.9,no.7,pp.1158{1170,2000. [49] D.Taubman,E.Ordentlich,M.Weinberger,G.Seroussi,I.Ueno,andF.Ono,\EmbeddedblockcodinginJPEG2000,"inProceedingsofIEEEInternationalConferenceonImageProcessing,2000,vol.2. [50] \Textures,"http://sipi.usc.edu/database/database.cgi?volume=textures. [51] J.XuandD.Wu,\Ripplettransformforfeatureextraction,"inProceedingsofSPIEDefenseSecuritySymposium,March2008,vol.6970,pp.69700X{69700X{10. [52] J.XuandD.Wu,\Ripplet-IItransformforfeatureextraction,"inProceedingsofSPIEVisualCommunicationsandImageProcessing,July2010. [53] A.Cormack,\TheRadontransformonafamilyofcurvesintheplane(I),"Proceed-ingsoftheAmericanMathematicalSociety,vol.83,no.2,pp.325{330,1981. [54] A.Cormack,\TheRadontransformonafamilyofcurvesintheplane(II),"ProceedingsoftheAmericanMathematicalSociety,vol.86,no.2,pp.293{298,1982. [55] F.NATTERER,Themathematicsofcomputerizedtomography,SIAM,2001. [56] K.Denecker,J.VanOverloop,andF.Sommen,\ThegeneralquadraticRadontransform,"InverseProblems,vol.14,no.3,pp.615{634,1998. [57] \Usc-sipiimagedatabase,"http://sipi.usc.edu/database. [58] \Rotatedtextures,"http://sipi.usc.edu/database/database.cgi?volume=rotate. [59] C.PunandM.Lee,\Log-polarwaveletenergysignaturesforrotationandscaleinvarianttextureclassication,"IEEETransactionsonPatternAnalysisandMachineIntelligence,vol.25,no.5,pp.590{603,2003. [60] I.Jollie,Principalcomponentanalysis,SpringerVerlag,2002. 148

PAGE 149

P.List,A.Joch,J.Lainema,G.Bjntegaard,andM.Karczewicz,\Adaptivedeblockinglter,"IEEETransactionsonCircuitsandSystemsforVideoTech-nology,vol.13,no.7,pp.614{619,2003. [62] C.Dorea,O.Escoda,P.Yin,andC.Gomila,\ADirection-adaptiveIn-loopDeartifactingFilterforVideoCoding,"inProceedingsofIEEEInternationalConferenceonImageProcessing,2008,pp.1624{1627. [63] O.Guleryuz,\Anonlinearlooplterforquantizationnoiseremovalinhybridvideocompression,"inProceedingsofIEEEInternationalConferenceonImageProcessing,2005. [64] R.GonzalezandR.Woods,\Digitalimageprocessing,"AddisonWisley,1992. [65] R.BrownandP.Hwang,IntroductiontorandomsignalsandappliedKalmanltering,JohnWiley&Sons,1992. [66] ITU-T,\VideoCodecforAudiovisualServiceatPx64Kbits/s,"ITU-Trecommenda-tionH.261version1,1990. [67] I.J.1,\CodingofMovingPicturesandAssociatedAudio{forDigitalStorageMediaatuptoabout1.5Mbit/s-Part2:Video,"ISO/IEC11172(MPEG1),November1993. [68] ITU-TandI.J.1,\Genericcodingofmovingpicturesandassociatedaudioinformation-Part2:Video,"ITU-TRec.H.262andISO/IEC13818-2(MPEG-2),November1994. [69] ITU-T,\Videocodingforlowbitratecommunication,"ITU-TrecommendationH.263version1,1995. [70] I.J.1,\Codingofaudio-visualobjects-Part2:Visual,"ISO/IEC14496-2(MPEG-4Part2),January1999. [71] I.Recommendation,\H.264/AVCVideoCodingStandard,"2003. [72] G.SullivanandT.Wiegnad,\VideocompressionfromconceptstotheH.264/AVCstandard,"ProceedingsoftheIEEE,vol.93,no.1,pp.18{31,2005. [73] T.Wiegand,G.Sullivan,G.Bjontegaard,andA.Luthra,\OverviewoftheH.264/AVCvideocodingstandard,"IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.13,no.7,pp.560{576,2003. [74] T.Wiegand,X.Zhang,andB.Girod,\Long-termmemorymotion-compensatedprediction,"IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.9,no.1,pp.70{84,1999. [75] \H.264/avcreferencesoftware,"http://iphome.hhi.de/suehring/tml/download/. 149

PAGE 150

G.Bjontegaard,\CalculationofaveragePSNRdierencesbetweenRD-curves,"Tech.Rep.VCEG-M33,ITU-TQ.6/SG16,April2001. [77] C.Shannon,\Amathematicaltheoryofcommunication,"BellSystemTechnicalJournal,vol.27,pp.379{423,1948. [78] C.Shannon,\Codingtheoremsforadiscretesourcewithadelitycriterion,"Informationanddecisionprocesses,pp.93{126,1960. [79] T.Berger,Ratedistortiontheory,Prentice-HallEnglewoodClis,NJ,1971. [80] R.Gray,Sourcecodingtheory,Springer,1990. [81] T.CoverandJ.Thomas,Elementsofinformationtheory,Wiley,2006. [82] Z.HeandS.Mitra,\AlinearsourcemodelandauniedratecontrolalgorithmforDCTvideocoding,"IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.12,no.11,pp.970{982,2002. [83] Z.HeandS.Mitra,\Optimumbitallocationandaccurateratecontrolforvideocodingvia/splrho/-domainsourcemodeling,"IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.12,no.10,pp.840{849,2002. [84] Y.Liu,Z.Li,andY.Soh,\AnovelratecontrolschemeforlowdelayvideocommunicationofH.264/AVCstandard,"IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.17,no.1,pp.68{78,2007. [85] T.ChiangandY.Zhang,\Anewratecontrolschemeusingquadraticratedistortionmodel,"IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.7,no.1,pp.246{250,1997. [86] Z.Li,W.Gao,F.Pan,etal.,\AdaptiveRateControlforH.264,"JournalofVisualCommunicationandImageRepresentation,vol.17,no.2,pp.376{406,2006. [87] M.JiangandN.Ling,\OnLagrangemultiplierandquantizeradjustmentforH.264frame-layervideoratecontrol,"IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.16,no.5,pp.663{669,2006. [88] S.Ma,W.Gao,andY.Lu,\Rate-distortionanalysisforH.264/AVCvideocodinganditsapplicationtoratecontrol,"IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.15,no.12,pp.1533{1544,2005. [89] H.Yu,Z.Lin,andF.Pan,\AnimprovedratecontrolalgorithmforH.264,"inIEEEInternationalSymposiumonCircuitsandSystems,2005,pp.312{315. [90] P.YinandJ.Boyce,\AnewratecontrolschemeforH.264videocoding,"inProceedingsofIEEEInternationalConferenceonImageProcessing,2004,vol.1. [91] J.XuandY.He,\AnovelratecontrolforH.264,"inProceedingsofIEEEInternationalSymposiumonCircuitsandSystems,2004. 150

PAGE 151

S.KimandY.Ho,\RatecontrolalgorithmforH.264/AVCvideocodingstandardbasedonrate-quantizationmodel,"inProceedingsofIEEEInternationalConferenceonMultimediaandExpo,2004,vol.1. [93] E.LamandJ.Goodman,\AmathematicalanalysisoftheDCTcoecientdistributionsforimages,"IEEETransactionsonImageProcessing,vol.9,no.10,pp.1661{1666,2000. [94] Y.AltunbasakandN.Kamaci,\AnanalysisoftheDCTcoecientdistributionwiththeH.264videocoder,"inProceedingsofIEEEInternationalConferenceonAcoustics,Speech,andSignalProcessing,2004,vol.3. [95] A.ViterbiandJ.Omura,Principlesofdigitalcommunicationandcoding,McGraw-HillNewYork,1979. [96] D.Kwon,M.Shen,andC.Kuo,\RatecontrolforH.264videowithenhancedrateanddistortionmodels,"IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.17,no.5,pp.517{529,2007. [97] X.Li,A.Hutter,andA.Kaupa,\Ecientone-passframelevelratecontrolforH.264/AVC,"JournalofVisualCommunicationandImageRepresentation,vol.20,no.8,pp.585{594,November2009. [98] Z.Li,F.Pan,K.Lim,G.Feng,X.Lin,andS.Rahardja,\AdaptivebasicunitlayerratecontrolforJVT,"inJVT-G012-r1,7thMeeting,PattayaII,Thailand,2003. 151

PAGE 152

JunXuwasborninTianmen,Hubei,China.HegothisB.E.andM.S.degreesatHuazhongUniversityofScienceandTechnology,Wuhan,China,in2003and2006.HereceivedthePh.D.degreeinelectricalandcomputerengineeringfromUniversityofFlorida,Gainesville,FLinAugust2010.Hisresearchinterestsincludeimageprocessing,videocodingandmultimediacommunication. 152