<%BANNER%>

Advanced Video Processing Techniques in Video Transmission Systems

MISSING IMAGE

Material Information

Title:
Advanced Video Processing Techniques in Video Transmission Systems
Physical Description:
1 online resource (107 p.)
Language:
english
Creator:
Yuan, Zheng
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Electrical and Computer Engineering
Committee Chair:
Wu, Dapeng
Committee Members:
Li, Xiaolin
Li, Tao
Chen, Shigang

Subjects

Subjects / Keywords:
videoprocessing
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre:
Electrical and Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
In this dissertation, the author addresses four processing techniques in a video transmission system. First, we propose a homogenous video retargeting approach to dynamically adapt the video from a large resolution to a small one, keeping the interesting contents without introducing artifacts. We then design a video summarization system to summarize a long video into a shorter version, allowing viewers to access content more efficiently. The experiments of the proposed retargeting and summarization approaches show performance gain over existing solutions. For the events of video packet loss during transmission, we propose a perceptual quality model that mimics the human response scores after watching the impaired videos. Based on discovering the visual psychological factors that directly relate to the human scores, the perceptual quality model achieves better correlation with the true human response. Finally, we address the perceptual quality based rate-distortion optimization problem in the encoder design. We propose a piecewise approximation method to find out a rate distortion (perceptual) model, based on which the best rate-distortion trade-off is achieved.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Zheng Yuan.
Thesis:
Thesis (Ph.D.)--University of Florida, 2013.
Local:
Adviser: Wu, Dapeng.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2013
System ID:
UFE0045769:00001

MISSING IMAGE

Material Information

Title:
Advanced Video Processing Techniques in Video Transmission Systems
Physical Description:
1 online resource (107 p.)
Language:
english
Creator:
Yuan, Zheng
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Electrical and Computer Engineering
Committee Chair:
Wu, Dapeng
Committee Members:
Li, Xiaolin
Li, Tao
Chen, Shigang

Subjects

Subjects / Keywords:
videoprocessing
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre:
Electrical and Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
In this dissertation, the author addresses four processing techniques in a video transmission system. First, we propose a homogenous video retargeting approach to dynamically adapt the video from a large resolution to a small one, keeping the interesting contents without introducing artifacts. We then design a video summarization system to summarize a long video into a shorter version, allowing viewers to access content more efficiently. The experiments of the proposed retargeting and summarization approaches show performance gain over existing solutions. For the events of video packet loss during transmission, we propose a perceptual quality model that mimics the human response scores after watching the impaired videos. Based on discovering the visual psychological factors that directly relate to the human scores, the perceptual quality model achieves better correlation with the true human response. Finally, we address the perceptual quality based rate-distortion optimization problem in the encoder design. We propose a piecewise approximation method to find out a rate distortion (perceptual) model, based on which the best rate-distortion trade-off is achieved.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Zheng Yuan.
Thesis:
Thesis (Ph.D.)--University of Florida, 2013.
Local:
Adviser: Wu, Dapeng.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2013
System ID:
UFE0045769:00001


This item has the following downloads:


Full Text

PAGE 1

ADVANCEDVIDEOPROCESSINGTECHNIQUESINVIDEOTRANSMISSIONSYSTEMSByZHENGYUANADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2013

PAGE 2

c2013ZhengYuan 2

PAGE 3

IdedicatethisdissertationtomylovingmotherXiaoliXu,fatherMinxuYuan,andotherfamilymembers,GuihuaWang,TingYue,LeiMu,MinxiaYuan,GuangyanXu,BingkunYuan,YulianZhaoandZhenwuXuforyourunlimitedsupportandlove. 3

PAGE 4

ACKNOWLEDGMENTS Firstofall,IwouldliketothankmyPhDadvisorDr.DapengOliverWu.Withhisbroadknowledgeofmyresearchareas,Dr.Wualwaysprovidedmewithstrategicadvisesandhelpedmetoshapemyresearchcoursesverywisely.Hedevotedalotoftimeandeffortinexchanginganddevelopingideaswithmeaboutthescienticresearch.Hiseruditewhileeasy-goingstylesetsmearolemodelofhowtoconductgoodqualityresearchinthefuture.IwouldliketothankDr.XiaolinLi,Dr.TaoLiandDr.ShigangChenforservingonmycommittee.TheyhavegivenmeimportantcommentsandsuggestionsonbothmyproposalanddefensesothatIknowhowtomakeimprovementprecisely.Iappreciatetheirtimeandeffortinreadingmydissertationandattendingmyoraldefense.Iwouldliketothankmycollegesandclassmates,Dr.TaoranLu,forhelpingmeinmanyaspectswhenIrstcametotheUS.Ienjoyedouracademiccollaborationtruly;Dr.ZhifengChen,forguidingmewithvideostreamingresearchtopic;Dr.QianChen,Dr.ZongruiDingandDr.YuejiaHe,foryourvaluablesuggestionsandsupport;Dr.JunXu,Dr.LeiYang,Dr.BingHanforyouropportunities.AspecialthankstoourdepartmentalgraduatecoordinatorMs.ShannonChillingworthforheralwayspromptandprecisereplytomyquestions.Thismakesmygraduateacademiclifereassured.IsincerelythankDr.YuHuang,Dr.ZhenyuWu,Dr.YuwenHe,Dr.JunXinandDr.DongqingZhangforyourmentoringduringmyinternshipintheindustry.Thehand-onexperienceslearnedfromyouaresignicanttome.SpecialthanksarealsoextendedtoDr.MiaoZhaofromStonyBrookUniversityandDr.TianyiXufromUniversityofDelaware,foryourconstructiveadviceduringourdiscussiononmyPhDresearch.Finally,IthankmyfriendChongPang,ChaoLi,YangLu,SteveHardy,HuanghuangLietc.inGainesvilleandLeiXu,XiaohuaQian,HaiquanWangandKeWangbackinChina.Becauseofthem,mylifeduringmyPhDstudiesiscolorful!Ialwayscherishyoursupport! 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 7 LISTOFFIGURES ..................................... 8 ABSTRACT ......................................... 10 CHAPTER 1INTRODUCTION ................................... 11 1.1ProblemStatement ............................... 11 1.2TheScopeoftheDissertation ......................... 12 1.3TheOrganizationoftheDissertation ..................... 14 2VIDEORETARGETING ............................... 15 2.1Background ................................... 15 2.1.1PreviousMethods ........................... 15 2.1.2OurApproach .............................. 18 2.2GlobalorLocal?StatisticalStudyonHumanResponsetoRetargetingScale ...................................... 20 2.3SystemDesign ................................. 24 2.3.1SystemArchitecture .......................... 25 2.3.2DesignPrinciple ............................ 26 2.4VisualInformationLoss ............................ 28 2.5TheNon-linearFusionBasedAttentionModeling .............. 30 2.5.1SpatialSaliencywithPhaseQuaternionFourierTransform ..... 31 2.5.2TemporalSaliencywithLocalMotionIntensity ............ 32 2.5.3NonlinearFusionofSpatialandTemporalSaliency ......... 33 2.6JointConsiderationsofRetargetingConsistencyandInterestingnessPreservation .................................. 35 2.6.1GraphRepresentationoftheOptimizationProblem ......... 37 2.6.2TheDynamicProgrammingSolution ................. 38 2.7OptimalSelectionofScaleinaShot ..................... 39 2.8ExperimentalResults ............................. 41 2.8.1SpatialSaliencyModeling ....................... 41 2.8.1.1Proto-regiondetectionresultsonspatialsaliencymap .. 41 2.8.1.2Subjectivetestoftheattentionmodeling .......... 43 2.8.1.3Computationalcomplexity .................. 44 2.8.2AttentionModelingComparisoninVideoRetargeting ........ 45 2.8.2.1Saliencyvideocomparison ................. 45 2.8.2.2Subjectivetest ........................ 45 5

PAGE 6

2.8.3TheComparisonofVideoRetargetingApproaches ......... 46 2.8.3.1Videoandimagesnapshotcomparison .......... 46 2.8.3.2Subjectivetest ........................ 47 3VIDEOSUMMARIZATION .............................. 52 3.1Background ................................... 52 3.2TheSummarizationPhilosophy ........................ 54 3.3RecognizetheConcepts ............................ 56 3.4TheSummarizationMethodology ....................... 57 3.4.1CriteriaofaGoodVideoSummary .................. 57 3.4.2ConstrainedIntegerProgramming .................. 58 3.5ExperimentResults .............................. 60 3.5.1AnExampletoIllustrateOurAlgorithm ................ 60 3.5.2SubjectiveEvaluation .......................... 61 4PERCEPTUALQUALITYASSESSMENT ..................... 63 4.1Background ................................... 63 4.2TheConstructionofTrainingSampleDatabase ............... 66 4.2.1TheWeaknessofPubliclyAvailableDatabase ............ 67 4.2.2VideoSampleGeneration ....................... 67 4.2.3ScoreCollectionandSurveyConduction ............... 68 4.2.4AnalysisoftheSurveyResult ..................... 70 4.3GlitteringArtifactDetection .......................... 70 4.3.1EdgeMapGeneration ......................... 71 4.3.2StructureDetection ........................... 73 4.3.3FalseIntersectionSuppression .................... 74 4.4FaceDeformationEstimation ......................... 75 4.5ModelTrainingtoMapPsychologicalFactorstoPerceptiveScores .... 78 4.6Experiments .................................. 79 5PERCEPTUALQUALITYBASEDVIDEOENCODERRATEDISTORTIONOPTIMIZATION .................................... 82 5.1Background ................................... 82 5.2PerceptualRDOframeworkbyPiecewiseLinearApproximation ...... 84 5.3RDSampling .................................. 86 5.4LocalRDCurveFitting ............................. 88 5.5PiecewiseEnvelopeGeneration ....................... 90 5.6ExperimentResults .............................. 93 6CONCLUSION .................................... 98 REFERENCES ....................................... 100 BIOGRAPHICALSKETCH ................................ 107 6

PAGE 7

LISTOFTABLES Table page 2-1CondenceintervalsbyT2estimationwithlevel1)]TJ /F6 11.955 Tf 12.86 0 Td[(=95%.CourtesyofRetargetMedatabasebyMIT[ 1 ]fortheimageretargetingbenchmark. ..... 25 2-2Condenceintervalanalysisforsubjectiveevaluationonspatialsaliencymodeling=0.05.CourtesyofimagedatabasebyXiaodiHou[ 2 ]. ............ 50 2-3Condenceintervalanalysisforsubjectiveevaluationonvideosaliencymodeling=0.05.Courtesyofthevideotracelibrary[ 3 ]. ................. 51 2-4Condenceintervalanalysisforvideoretargeting=0.05.Courtesyofthevideotracelibrary[ 3 ]. ................................ 51 3-1Thestatisticsofscoresoffourvideoclips.Videocourtesyof[ 4 6 ]. ....... 62 4-1Thedistributionandpropertiesofthegeneratedvideoclipswithpacketlossartifacts.I-interlace,C-chessboard,R-RegionofInterest.Courtesyofthevideotracelibrary[ 3 ]. .................................... 69 4-2Thequestionsforthevideosampleschosenbyeachviewer ........... 70 4-3Thecomparisonoftheperceptualqualitymodelin[ 7 ]andourproposedperceptualmodel ......................................... 80 5-1Bitratereduction(%)ofourproposedRDOforinter-framecodingunderBaselineprole.Courtesyofthevideotracelibrary[ 3 ]. ................... 95 7

PAGE 8

LISTOFFIGURES Figure page 1-1Thearchitectureofavideotransmissionsystem. ................. 11 2-1Retargetedviews:globalvs.local. ......................... 19 2-2Imagesforthestatisticstudyofhumanresponsetoretargetingscale. ..... 22 2-3Videoretargetingsystemarchitecture. ....................... 26 2-4Motionsaliencycomparisonwhenglobalmotioniscorrectorwrong. ...... 33 2-5Graphmodelforoptimizecropwindowtrace. ................... 37 2-6Comparisonofsaliencyanalysisonimages. .................... 43 2-7StatisticsforsaliencymapsbySTB,CS,HC,RCandOurmethod. ....... 44 2-8Twosnapshotsofvideosaliencymodelingcomparisonbetweenbaseline-PQFT[ 8 ]andourapproach. .................................. 45 2-9Statisticalanalysisforvideosaliencymodeling. .................. 46 2-10Retargetingresultsforvisualsmoothness. ..................... 48 2-11Statisticalanalysisforvideoretargetingapproaches. ............... 49 3-1Theconceptofthevideosemantics. ........................ 54 3-2AnillustrationofBag-of-Wordsfeature. ....................... 57 3-3Recognizedconceptsfromoriginalvideo. ..................... 61 3-4Videosummarizationresults. ............................ 61 4-1Thedegradedvideoduetotransmittedpacketlossevents. ........... 64 4-2Apacketlossdegradedvideoframewithglitteringblockshighlighted. ..... 71 4-3ArtifactalongMBboundarywithtwopixelsineachside. ............ 72 4-4Theresultofedgelocationselection. ....................... 73 4-5Block-wiseperceptionvs.pixel-wisedetection. .................. 74 4-6Artifactlayoutscenarioswiththeirdesirededgestructurestodetect. ...... 75 4-7Theresultofstructuraldetectionfollowedbyfalseintersectionsuppression. .. 76 4-8Visuallyperceivedannoyingnessvs.thecontrastoftheartifactagainstadjacency. 77 8

PAGE 9

4-9Adeformedfaceregionanditstwosubregionslostandcopiedfrominconsistentreferredareas. .................................... 77 4-10Thescatterplotofthepredictedscorevs.theactualscore. ........... 80 5-1TheblockdiagramoftheperceptualqualitybasedRDOsystem. ........ 85 5-2TheRDsamplesresultedfromvaryingforvideoBus. ............. 88 5-3TheRDperformancecomparisonofaninteriorRDpointwithtwoRDpointsontheRDenvelope. ................................ 89 5-4ThegeometriccharacteristicsofalocalRDcurve. ................ 90 5-5RDsamplesoverdifferentQPandvaryingLagrangemultipliers,videoBusandMobile. ..................................... 91 5-6ThettedmodeloflocalRDsamples. ....................... 92 5-7ThepiecewiseapproximationofRDenvelopebytangentlinesegments. ... 93 5-8ThecommontangentlineoftwolocalRDcurves. ................ 94 5-9PiecewiselinearapproximationoftheRDenvelope. ............... 96 5-10PerceptualRDOperformancecomparison. .................... 97 9

PAGE 10

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyADVANCEDVIDEOPROCESSINGTECHNIQUESINVIDEOTRANSMISSIONSYSTEMSByZhengYuanAugust2013Chair:DapengWuMajor:ElectricalandComputerEngineeringInthisdissertation,weaddressfourprocessingtechniquesinavideotransmissionsystem.First,weproposeahomogenousvideoretargetingapproachtodynamicallyadaptthevideofromalargeresolutiontoasmallone,keepingtheinterestingcontentswithoutintroducingartifacts.Wethendesignavideosummarizationsystemtosummarizealongvideointoashorterversion,allowingviewerstoaccesscontentmoreefciently.Theexperimentsoftheproposedretargetingandsummarizationapproachesshowperformancegainoverexistingsolutions.Fortheeventsofvideopacketlossduringtransmission,weproposeaperceptualqualitymodelthatmimicsthehumanresponsescoresafterwatchingtheimpairedvideos.Basedondiscoveringthevisualpsychologicalfactorsthatdirectlyrelatetothehumanscores,theperceptualqualitymodelachievesbettercorrelationwiththetruehumanresponse.Finally,weaddresstheperceptualqualitybasedrate-distortionoptimizationproblemintheencoderdesign.Weproposeapiecewiseapproximationmethodtondoutaratedistortion(perceptual)model,basedonwhichthebestrate-distortiontrade-offisachieved. 10

PAGE 11

CHAPTER1INTRODUCTION 1.1ProblemStatementThefastdevelopmentofthemultimediacompressionandnetworktechnologiesenablesafastproliferationofvideotransmissionapplications,suchasmultimediastreaming(VideoonDemand),VideoLiveBroadcasting,VideoConferencingandVideoConversionalApplications.Althoughthedetailedimplementationofthesesystemsmayvary,thecommonarchitectureofatransmissionsystemforvideomediatypicallyconsistsofavideoacquisitionandencodingmodulesonthesenderside,atransmissionchannelwithcertaincapacityinthemiddle,andthevideodecodinganddisplaymodulesonthereceiverside.Fig. 1-1 showsthearchitectureofavideotransmissionsystem. Figure1-1. Thearchitectureofavideotransmissionsystem. Videoacquisitioncomponentisacameramoduletocollectthepixelvaluesofascene.Encodingcomponentcompressesthepixelvaluesintoamuchsmallerversiontofacilitatethetransmission.Inbetween,theremaybepreprocessingmoduletoadjustthevideocontentforefcientencoding.Thenthecompressedvideoispacketized 11

PAGE 12

intomediapacketsfortransmission.Duringthetransmission,achannelwithtransportprotocolsdeliversvideopackets.Inthereceiverside,adecoderdepacketizesanddecompressesthevideopacketsintopixelvalues.Mostoftenartifactsduetocompressandchannelerroreventscanhappen.Inthesesituations,thedecoderneedtryitsbesttodecodethepossibleimpairedbitstreamandapost-processingunitisusuallyinvolvedtomitigatetheartifacts.Thedisplaycomponentdisplaysthenalcontenttoviewers.Twokeyvaluesofdesigningavideotransmissionsystemarecontentadaptabilityandvisualqualityawarenessdesign.Themainpurposeofcontentadaptabilityistodistributemultimediatomaximumreachableaudiences.Sincetheimplementationofeachcomponentcanvaryacrossaverylargespectrum,adaptingtheoutputofaparticularcomponentintoaneligibleorevenoptimalinputofthenextcomponentcanbringaseamlessconnectionbetweenthetwo.Theultimategoalofadaptabilityistheubiquitousaccessofmedia,inwhichacomponentcanhandletheoutputofanyimplementationofitslogicallyconnectedcomponent.Themainmotivationofvisualqualityawarenessdesignistousehumanvisualperceptionsimulatedindextomeasurehowwellofacomponentofavideotransmissionsystembehavesandthusmakestrategiestooptimizethecomponent.Afterall,itisuptohumanstoevaluatethevideotransmissionservices,therefore,visualqualityawarenessdesignmayexploithumanvisualqualitycharacteristicsdirectlytoenhancetheperformanceofatransmissionsystem,e.g.asourceencoder. 1.2TheScopeoftheDissertationThisdissertationconsidersboththetwovalues:forcontentadaptability,theauthorproposestwotechniquesvideoretargetingandvideosummarization,whichadapttheresolutionandframerateofvideosourceinthevideoacquisitioncomponenttotheonesinthevideodisplaycomponent.Videoretargetingisatechniquethatintelligentlyadaptstheoriginalvideototheresolutionofthedisplay.Thistechniquecanenabletheviewerwithsmallscreendevicesmoreeffectivelytoattendtheregion-of-interestoftheoriginal 12

PAGE 13

atmuchlesscostofartifacts.Itcanalsobewidelyusedinvideotranscodingarea,whereanencodedbitstreamisneededtobeconvertedtoanotherbitstreammatchingchannelorreceiver-sidechange.Parallelingly,Videosummarizationisatechniquethatadaptstheoriginalvideototheframeratetoothervaluepreferredbytherecipient.ItcanhelpviewerstoefcientlyaccessthecontentofaVideo-on-Demandsystemandalsobeusedinvideotranscodingarea.Forvideosurveillance,whichisaspeciallivebroadcastingapplication,videosummarizationmayautomaticallydetecttheunusualactivitiesinthevideoandthusonlynotifytherecipientupondetected.Forvisualqualityawarenessdesign,inthereceiversidetheauthorproposesaperceptualqualitymetrictomeasurethedecodedvideoquality.Inthesenderside,theauthoraddressestheessentialproblem(thetrade-offbetweenbitrateanddistortion)ofencoderdesignintheperceptualcontextandproposesaperceptualqualitybasedencoderoptimizationmethodaccordingly.Inavideotransmissionsystem,channelsometimesfailstodelivervideopacketsfromthesendertotherecipientduetoreasonssuchasreducedbitrateandnetworkcongestion.Intheseerrorevents,althoughthedecoderwillstillmakebestefforttorecoverorconcealthecorruptedvideo,itisnotuncommonthattheartifactsstillcanbeobserved.Inordertomonitorthevideoquality,anautomaticqualityassessmentmethodbasedonartifactedgemapandfacedetectionisproposedfromthedecoderside.Intheencoderside,weaimtoachieveabitstreaminlowestbitratepossiblegivenacertaindistortionlevel.Thisrequirestheknowledgeofthemodelofdistortionwithbitrate.Whenthedistortionismeasurebyperceptualqualitymetric,suchmodelmaynotbeeasilyderivedfromtheperspectiveofinformationtheoryasthedistortionmeasureisinadifferentnature.Theauthorinsteadproposesanexperimentbasedsamplingmethodwithcurvettingtechniquetolearnthemodelandthusachievethebestratedistortiontrade-off. 13

PAGE 14

1.3TheOrganizationoftheDissertationTheremainingofthisdissertationisorganizedasfollows.Chapter 2 andChapter 3 introducetheproposedvideoretargetingandvideosummarizationtechnique,respectively.Chapter 4 proposestheperceptualqualitymodeltoassessthevideoqualityduringpacketlosseventandChapter 5 isabouttheperceptualratedistortionoptimizationinencoderdesign.Chapter 6 concludestheworkofthisdissertation. 14

PAGE 15

CHAPTER2VIDEORETARGETING 2.1BackgroundThankstotheenrichedcommunicationnetworkresourcesandefcientcompressiontechniques,adiversityofmobiledevicesandstreamingterminalsgainsignicantsharesinmultimediaaccessoveryears.Whiletheyaredesignedinuniqueresolutionsandaspectratios,mostmediasources,whencreated,generallyfollowthestandardformats(e.g.resolution19201080,720576).Astheproxytotransfervideomediaacrossplatforms,videoretargetingtechniquesautomaticallyadaptsourcevideocontentstotforthedisplaysizeoftargetdevices(generallyfromlargetosmall).WhiletheyattempttopreservemostVisualInterestingness(VI)foreachindividualframeasrequiredintheimageretargetingtask,theyalsodemandthetemporalconsistenciesamongadjacentframestogeneratevisuallyagreeableretargetingresults.AsvideoretargetingseekstopreservetheVIefciently,itissensibletounderstandhowtheVIdistributesoverpixelsspatiallyandtemporally.Attentionmodeling,asaself-containedresearchtopic,providessuchdistributioninformationbymimickingthevisualstimulusfromeachpixelofvideoframes.Inliterature,manymethodsareproposed,includingRef.[ 2 8 11 ]andalso[ 12 ]wheremulti-modalitiesareconsidered.Severalaliasesofvisualinterestingnessincludevisualimportance[ 13 ],visualenergy[ 14 ],saliency[ 9 ][ 2 ]. 2.1.1PreviousMethodsCurrentvideoretargetingtechniquesareconceptuallyclassiedintotwomajorcategories:thehomogeneousvs.heterogeneousapproaches.Thehomogeneousmethodologyformulatesvideoretargetingassearchingforasequenceofrigidretargetingwindowsoneachoriginalframefollowedbyhomogeneouslyresizingthecontentswithinthewindowintothetargetdisplay.Althoughthehomogeneousmethodologymaysacricecontentsoutsidetheretargetingwindow,thisschemeallows 15

PAGE 16

asystematicregulationofretargetedpixels.Spatially,theselectedpixelsaretreatedequallytoavoidthegeometricdistortion,whichisespeciallymeaningfulwhenthecontentcontainsthewell-denedorviewerfamiliarobjectssuchashumanfaceorarchitectures.Temporally,theretargetingconsistencycanbeeasilyachievedbymerelyimposingconstraintsonthewindowparametersacrossneighboringframes.Giventhatviewersexercisenon-uniformattentionresponsetostimulusfromdifferentpixelsshownbyvisualpsychophysicalstudies,Liuetal.'sauto-pan-scan[ 15 ]andHuaetal.'ssearchwindowmethod[ 16 ]lookforawindowtomovedynamicallytosecurethemostvisuallyinterestingregionsforeachindividualframe.Regardingthearisenconsistencyissue,theybothutilizecurvettingtosmooththewindowparameters.Althoughthisprocedurealleviatestheshakyartifactstosomeextentwhenvisuallyinterestingareaslocatecloselyoverframes,itcannotguaranteeconsistentretargetedviewsinpresenceofrapidandirregularcontentchanges.Instead,Deselaersetal.[ 17 ]considersthevisualinterestingnesspreservationandretargetingconsistencytogetherastheretargetingscoresandtracesbackthebestwindowsequencethatmaximizestheaccumulationscores.Thisapproachforthersttimemodelstheretargetingconsistencyissueasabuild-inconsiderationandthusensurestheretargetedviewsconsistent.Nevertheless,thissetupofmaximizationofaccumulationscoremaynotquitesuitfortheVIpreservation:withoutknowingapre-recognizeddestinationthatspeciestheretargetingwindowtotracktheupdatedvisualinterestingareas,thecalculatedwindowsequenceisanchorednearitsinitialpositionduetotheencouragementofanover-inertiatransitionamongadjacentframes,whichwenameasconsistencyclampingartifact.Thisartifactisvisibleespeciallyinlongvideowhilstitscomplexbackgroundchangerequirestheretargetingwindowtoupdatetothecurrentposition.Also,ifthereissaliencyinaccuracyintheinitialframe,theidentiedparametersoftherstRWisimpaired.Theimpairmentwillpropagateinto 16

PAGE 17

allthefollowingframesduetoclamping.Forlongvideoretargeting,thepropagationisevenlonger.Ontheotherhand,theheterogenousmethodologydoesnotperformhardselectionofachunkofpixelscontinuouslyalignedinarectanglewindow,butrathertakeseachpixelindividuallyandimposessoftmanipulationofpotentiallyeverypixel.Thisexibilitymakestheheterogenousmethodologythefocusofmediaretargetinginacademiaformanyyears,whichdevelopsprimarilyintwotracks.Thersttrackfocusesonseamcarving:[ 14 ]shrinksoriginalimagebyeliminatingpixelsaligninginseams(continuouscurves)inpriorityoftheonewithlessvisualenergy.Thesameideaextendstovideoretargeting[ 18 ]bycutting2Dseammanifoldsthereinfroma3Dvideoframevolume.Recently,Matthiasetal.[ 19 ]imposethediscontinuityconstraintonseamstructurestoalleviatethecuttingoffeaturedobjects.[ 20 ]adoptsmulti-operatorschemetochooseseamcarvingwhenappropriate.Theothertrackiswarpingbasedapproaches:theydonotexplicitlyremovepixelsawaybutrathermorphologicallysqueezethemtovariousextentsproportionaltotheirvisualimportance.Wolfetal.[ 13 ]initializesthisideabyformulatingthemappingfromoriginalpixeltotheretargetedcorrespondenceasasparselinearsystemofequations.Wangetal.[ 21 ]thenintroducethescale-and-stretchmethodtowarpimagewiththematchedlocalscalingfactors.Theauthors[ 22 ]thenexpandthesamephilosophytovideoretargetingbyincorporatingmotion-awareconstraints.Inpractice,Krahenbuhletal.[ 23 ]implementauniquesystembasedonwarpingforstreamingapplications.Inessence,theexiblepixelrearrangementoftheheterogenousmethodologyavoidsexplicitcontentsacrice,suggestingsomewhatlatentpreferencetoagloballyretargetedview.Theyproduceexcellentresultsinnaturalimagesandinscenarioswhenaspectratiochangeissignicant.However,individualmanipulationsofpixelsalsodemandforsuchalargenumberofpixelparameterstobejointlyoptimizedthattherearealwayssomepixelsarenotcoordinatedwellspatiallyortemporally.Therefore,itis 17

PAGE 18

commontoobservetheresultantdeformationand/orinconsistency,whichcanbequitenoticeableorevendisastrouswhenthecontentinvolveswell-denedobjects. 2.1.2OurApproachMotivatedbythedifferencebetweenhomogeneousandheterogenousmethodologies,ourapproachtakesintoaccountthemajordesignconsiderations:a)contentpreservation,b)thetemporalretargetingconsistencyandc)thepreventionofdeformation.Weaskabigquestion:fromtheperspectiveofobservers,whataretherealprioritiesamongtheseconsiderationstoperformacompellingvideoretargeting?Forcontentpreservationvs.non-deformation,sincetheyareboththein-frameconsiderations,itisinsightfultoexploretheconclusionsfromimageretargetingevaluations.Recently,Rubinsteinetal.[ 1 ]createdabenchmarkimagedatabaseandperformedcomprehensivestatisticalanalysisonthehumanresponsetoseveralstate-of-the-arttechniquesfrombothhomogeneousandheterogenousmethodologies.Theyfoundthatviewersconsistentlydemonstratehighsensitivitytodeformation,andinmanycasesusersprefersacricingcontentoverinsertingdeformationtothemedia[ 1 ].Thenforthetemporalretargetingconsistencythatstandsoutinvideoretargeting,itplaysanevenmoreimportantrole:evenamildinconsistencyamongadjacentframeswouldleadtotheannoyingickeringorjittering,whichmayfatiguehumaneyesquickly;whereas,contentsacricemaybelesstangiblewithoutpresenceoforiginalvideosinceviewersarecapableofrecoveringthecontentviaimagination.1)Ourapproachenablesauser-speciedretargetingscale,makingthetrade-offsbetweenthevisualcontentpreservationandretargetingconsistencyappropriateforeachindividualviewer.Comparingheterogenousapproachesthatassertivelyimposethepro-global-viewretargetingorthetraditionalhomogeneousmethodsthatimposethepro-local-viewretargeting,ourrenedapproachisclosertotherealvieweraestheticfact.Thisuser-specicpracticeisinspiredandjustiedbythestudyofhumanresponsetotheretargetingscale. 18

PAGE 19

Figure2-1. Left:originalimage.Righttop:retargetedimagebypro-globalview.Rightbottom:retargetedimagebypro-localview.PhotoscourtesyofRetargetMedatabasebyMIT[ 1 ]fortheimageretargetingbenchmark. 2)Ourvideoretargetingsystemiscapableofprocessinglong-durationvideowithgenericcontents,notlimitedtotherealmofshortvideoclipsasinmanyexistingwork.Unlikeshortclipsthatlastonlyonescene,longvideoscontainsmanyscenechangesandthelengthofascenecanbeverylongorveryshort.Theproposedsystemstudiesthetemporalretargetingconsistencycomprehensively,whichtoourbestknowledgeisthersttimetodiscussthisissueonsuchanelaboratelevel.Consideringthatframesatdifferenttemporallocationsrequiredifferentconsistency,ourretargetingsystemadaptsthetrade-offstructuretodifferentframetypes,aimingatstrivingforthemostfreedomforsaliencypreservation.Onthewhole,thisframeworkbridgesthemiscellaneousretargetingconsiderationsinpracticewithastructuralizedanalysisofretargetingobjectivesandendowsthegenericvideoretargetingproblemwithelegantmathematicalformulations.3)Asthesubstantialalgorithmofthevideoretargetingsystem,weformulatetheretargetingasanoptimizationproblem,wherethevariablestosolvearethesequentialpositionsoftheretargetingwindowsoverasubshot.Regardingtheobjectivefunction,weproposethevolumeretargetingcostmetrictosystematicallyconsiderstheretargetingconsistencyandVIpreservationtogether. 19

PAGE 20

Wefurtherrepresenttheoptimizationintoagraphcontextandthenprovetheequivalencyoftheoptimizationassearchingforthepathwithminimaltotalcostonthegraph.Thesolutionisobtainedindynamicprogrammingfashion.Itisencouragingthatthesolutionmayextendtotheothermeasuresofvisualinterestingnesspreservationandconsistencyrenedinthefuture.4)Fortheattentionmodeling,weproposeaninnovativecomputationalmodelwithnon-linearfusionofspatialandmotionchannels.Theproposedmodelwithindependentchannelscapturesthedistinctmechanismofhumanperceptiontoluminance,chrominanceandmotionstimulus,avoidingtheshapetwistofsaliententitiesduetojointprocessingofintertwinedspatial-temporaldatainmanyattentionmodels.Also,thenon-linearfusionschemetakesadvantageofthepropertiesofthecomputedvisualinterestingnessdistributionfromthetwochannelsandstrivesforthedetectionofmeaningfulentities,whichmaysubsequentlymakethesuggestedinterestingobjectsmorelikelybepreservedintheretargetingwindow.Thischapterisorganizedasfollows.Section 2.2 describesthestatisticalstudyofhumanresponsetoretargetingscales.Section 2.3 presentstheproposedvideoretargetingsystemarchitecture.Section 2.4 describesavisualinformationlossmetrictomeasureinterestingnesspreservationandSection 2.5 proposesanon-linearfusionbasedattentionmodel.Section 2.6 proposesavolumeretargetingcostmetricandthecorrespondinggraphrepresentationwiththedynamicprogrammingsolution.Section 2.7 presentsamethodtochooseauniedscaleforashot.Section 2.8 showsourexperimentalresults. 2.2GlobalorLocal?StatisticalStudyonHumanResponsetoRetargetingScaleThissectionstudieshowviewersevaluateretargetingscales.Ideologically,mostheterogeneousapproachesfollowthehypothesisthataglobalscaleispreferredintheretargetingtaskwhilemostcurrenthomogeneousapproachestendtoretargetcontentin 20

PAGE 21

alocalscale(theymayalsobepro-globalscaleiftheaspectratiochangeisnotdrastic).Consideringboththemeritsandweaknesseswiththetwomethodologies,weinquirethevalidityofthehypothesesbyexaminingwhethertherereallyexistsaconsensusofperceptionbiastowardsaparticularscale,eitherglobalorlocal.Thisinquiryisinspiredbythesubjectivityofhumanaesthetics,therandomnessoftheimagecontent,theretargetingpurposeandmanyothernon-objectivefactors,whichprobablysuggestthereisnoconsensusofpreferencetomakeonescaledominatetheotherandthusbothhypothesesareopposed.Notethatalthoughpreservingtheoriginalimageonaglobalscaleisintuitivelyjustiedamongmanyviewersandthenisenforcedbymostexistingworks,asignicantnumberofpeoplealternativelyconsideritwouldnotbestrictlynecessaryifatthecostofgeometricdistortionsorinconsistencyoveradjacentframes.Theypreferanicelocalretargetingthatenhancestheemphasisontheobjectofinterest.Forexample,inFig. 2-1 ,thetoprightimageisbypurelyscalingtheoriginalimagegloballywithoutanycontentremovalandthebottomrightimageisobtainedbyrstcroppingalocalregionandthenscalingtotthetargetdisplay,butwithasmallerscalingfactor.Asexpected,manyviewerswesurveyedclaimedthetopisbetterforitscompletecontentpreservation;however,otherviewersarguedthatitisreasonabletocutofftheboundaryregionsasthegreentwigstherearenotvisuallyimportantandnotevenintactintheoriginalframe,butthecroppedretargetingrendersthesunowerwithnerresolutionasaconsideringperceptionadvantageofthebottomimage.Toclearthedisagreement,weconductastatisticalstudy http://www.mcn.ece.ufl.edu/public/ZhengYuan/statistical_study_retargeting_scale.html totestwhichhypothesisofretargetingscaleistrue,aimingatthepotentialcomputationalmeasureforthesupportedhypothesis.Orotherwiseneithercanoverridetheotherifsuggestedbythestudy,wemayleavethefreedomofchoosingaretargetingscaletotheindividualviewerifneitherhypothesiscanbestatisticallysupportedtodominatetheother. 21

PAGE 22

Figure2-2. Uplefttobottomright:ArtRoom,MusicSound,DKNYGirl,Redtree,Trees,Football,Sunower,Architecture,Buttery,CanalHouse,Perissa,Child,Fish,GreekWine,Fatem.Eachoriginalimageisinoneofthethreecommonaspectratios(4:3,3:2and16:9)andisretargetedtoallthreeaspectratios.Theyarerepresentedasthumbnailsduetotheconictbetweenvarioussizesandthespacelimitation.Oursurveywebsiteprovidestheretargetingresultsintheirauthenticsizes.PhotoscourtesyofRetargetMedatabasebyMIT[ 1 ]fortheimageretargetingbenchmark. Wedevisethestatisticalexperimentasfollows:given15differentimagesthatcovermostpopulartopicsinphotography,weretargeteachimagewithtwodistinctscales,whichrepresentpro-global-view1andpro-local-viewstrategies,respectively.Wealsoincorporatetargetaspectratioasavariabletoourstatisticalstudy:eachoneofthe15imagesisinoneofthreecommondisplayaspectratios3:2,4:3and16:9;foreachimage,weretargetitintoallthreeaspectratioswithtworetargetingscales.Thenwecollecttheresponsescoresfrom60viewerstoevaluatetheretargetedimage.Theresponsescoresareratedaccordingtotheindividualaestheticstandard,rangingfrom1to10withtheincrementas1.Ourobjectiveistodetermineifthereisameaningfulgapoftheresponsescoresbetweenthetworetargetingstrategies. 1Hereweusetwomethodstoimplementtheglobal-scaleretargeting,theglobalcropping-and-scalingandtheheterogeneousmethodin[ 21 ].Theformerintroducesnoshapedeformationwhileitsmodestcroppingmayremovealittleborderareasandthelatterkeepsallcontentswithleastshapedeformation.Weassignthehigherscorefromthetwomethodsasthescoreforglobalretargetingtomeasureitsbestperceptionperformancewhereasasinglemethodishardlytobedeformation-freeandkeepingallcontentsaswell. 22

PAGE 23

Statisticallyspeaking,giventhegroupedscoresamplesXik=[xik1,...,xikj,...,xikn],i=1:60,k=1:2,n=153=45,whereiistheviewerindex,jistheimageindex,kisthestrategy/groupindexandnisthenumberofretargetingprocesses,wewanttoinferifthesubjectivescoressuggestretargetingequivalencebetweentwogroups.Retargetingequivalenceisstatisticallydenedasthenormofthegroupmeandifference=X1)]TJ /F5 11.955 Tf 15.96 2.65 Td[(X2isboundedbythesubspaceH=f:)]TJ /F5 11.955 Tf 9.29 0 Td[(11)]TJ /F6 11.955 Tf 11.96 0 Td[((2)Weutilizecondenceintervalestimationtosolvetheproblem.AssumethecollectedresponsescoresXikofeachgroupfollowsamultivariateGaussiandistributionwithmeankandvariance-covariancematrix. XikN(k,)8i=1:n,k=1:2(2)Hence,thedifferenceoftwogroupmeansalsofollowsGaussiandistributionwithmean1)]TJ /F6 11.955 Tf 11.96 0 Td[(2andcovariance, N(1)]TJ /F6 11.955 Tf 11.95 0 Td[(2,)(2)Giventhecondencelevel1)]TJ /F6 11.955 Tf 12.3 0 Td[(,weestimatethecorrespondinginterval^HwhereprobablyliesinaccordingtothedistributioninEq.( 2 ).If^HisasubspaceofH,weconsideredthetwostrategiesareretargetingequivalent.If^HisasubspaceofH,itsuggeststhatviewerspreferlocalretargeting.If^HisasubspaceofH,itishighlylikely 23

PAGE 24

thatviewerspreferglobalretargeting.Otherwise,nosignicantpreferencebiasexists.Table 2-1 describesthe95%condenceintervalofthegroupmeandifferenceforeachdimension/image,usingT2condenceintervalanalysis.AsindicatedinTable 2-1 ,nineretargetingprocessesmarkedwithresultinthecondenceintervalsinH,sotheysuggestthepreferencetoglobalretargeting.Meanwhile,anothersixretargetingprocessesmarkedwithhasthecondenceintervalsinH,suggestinglocalretargetingispreferred.Fortherest30retargetingprocesses,theirintervalsareeitherretargetingscaleequivalenceornosignicantpreferencebias.Therefore,ifwetaketheretargetinggenerically,thereisnoconsensusonthepreferencetosomeparticularscale.Thisconclusionsuggeststhatintheretargetingtask,onedoesnothavetopreemptivelypreserveaglobalvieworlocalview.Basedonthisinference,ourproposedsystemendowsthefreedomofchoosingtheglobalvieworlocalvieworsomeotherscalesinbetweentoindividualusersaccordingtotheirownaestheticpreferencesandneeds.Thisstrategymaximizestheretargetingperformancebyallowingthegreatestexibility.Thecorrespondinginterfaceofascaleoptimizerisdesignedinatry-and-errorfashiontofacilitateviewerstodetermineamorecustomizedviewingscale. 2.3SystemDesignThissectiondiscussesthedesignmethodologyelaborately.Weproposeapracticalretargetingsystemthatadoptsthehomogeneousmethodology,withretargetingprocessequivalenttosearchingforthescaleandpositionofaproperretargetingwindowforeachframe.Inourrenedhomogeneousapproach,weletviewersdeterminetheiraestheticallypleasingscale,hencethecorrespondingremovaloftheperipheryregionbasedonthecustomizedscaleshouldbeconsideredreasonablefortheindividualviewer.Particularly,thehomogeneousmethodologyallowsaretargetingwindow(RW)tocontainanyregion,eventothefullinclusion. 24

PAGE 25

Table2-1. CondenceintervalsbyT2estimationwithlevel1)]TJ /F6 11.955 Tf 11.95 0 Td[(=95%.CourtesyofRetargetMedatabasebyMIT[ 1 ]fortheimageretargetingbenchmark. Images3:24:316:9 LboundUboundLboundUboundLboundUbound ArtRoom-1.00-0.02-0.780.23-2.81-1.64DKNYGirl-0.400.58-0.260.57-0.151.08Child-0.790.860.781.64-0.420.55Buttery-0.090.591.121.851.111.60Greenwine-1.95-1.10-2.51-1.70-1.05-0.41 CanalHouse1.042.71-0.800.69-0.370.49Sunower-0.280.85-0.280.69-0.890.11Fatem-2.59-1.66-1.470.61-1.190.59Fish0.100.901.552.36-0.520.84Perissa-3.42-2.03-2.03-0.98-2.22-0.94 MusicSound0.171.16-0.070.790.842.05Football-3.36-2.67-1.94-1.16-1.38-0.47Trees-1.080.79-0.991.190.030.94Architecture-1.050.52-0.890.01-0.870.34Redtree0.281.32-0.300.55-0.110.72 2.3.1SystemArchitectureFig. 5-1 describesthesystemarchitectureandexplainstheworkowofhowtondthescaleandlocationfortheRWofaframe.Thesystemconsistsofsixmajorcomponents:a)ShotDetectionb)SaliencyCalculationc)ScaleOptimizationd)VisualInformationAnalysise)Boundary-FrameRetargetingandf)Inner-FrameRetargeting.Shotdetectionmoduledividesalongvideointovisuallycoherentunitsforsubquentialindependentretargeting.SaliencydetectionmoduleimplementstheattentionmodelweproposedinSection 2.5 toquantizetheinterestingnessofeachpixelontheframe.ScaleoptimizationmoduleevaluatesanentireshotandthendeterminesasingleoptimalscaleoftheRWsfortheshot.Visual-InfoAnalysismoduletransformsthesaliencydistributionintothepotentialvisualinformationlossvalueincurredbytheRWatallpossiblelocations.Boundary-frameRetargetingmodulesearchesthebestlocationoftheRWfortheboundaryframes.Finally,theinner-frameretargetingmoduletakestheinnerframesaltogetheranddeterminesthedynamictraceoftheRWsoverthem. 25

PAGE 26

Figure2-3. Rectangleblock:systemmodules.Ellipticalblock:videoorimageframe. Asanewframecomingin,theshotdetectionmodulecomparesthestatisticsofitsluminanceandchrominancewithpreviousframesanddecidesifashotisinitiated[ 24 ][ 25 ].Atthesametime,thesaliencycalculatortakestheincomingframeandgeneratesasaliencydistributionmapinthesameframesize.ThenallframesinthedetectedshotwiththeirsaliencymapsarestreamedintothescaleoptimizertondauniedviewingscalefortheRWsoftheentireshot.AsthesizeoftheRWisaestheticallydeterminedwiththeoptimizedscale,theVisual-InfoAnalyzercomputesthepotentiallossduetothecroppingandscalingincurredbytheRWsatallpossiblelocationsforeachframe.Finally,theoptimallocationsoftheRWsaresearchedintwocases:fortheframesattheboundaryofasubshot(thesubshotgeneratorchopsashotintoseveralevenlengthsubshots),Boundary-FrameRetargetingmodulendstheRWbyonlyminimizingthevisual-infolossoftheindividualframe;forotherwithin-subshotframes,Inner-FrameRetargetingmoduleconsidersthesequentialframesjointlyanddeterminesasmoothtraceofRWs. 2.3.2DesignPrincipleTheproposedsystemisdesignedinsuchawayinordertocomprehensivelyaddressthevisualconsistency,ratherthantreatingitasatrivialpostprocessing 26

PAGE 27

procedure.Itissensibletonoticethattheframesatdifferenttemporallocationsraisedistinctextentofconsistencyrequirements.Utilizingthisnon-uniformity,weexercisecustomizedretargetingschemestospecializethetrade-offbetweentherequiredconsistencyandvisualinformationpreservation.Thisexibilitycommitstothemostexploitablevisual-infopreservationaslongastheconsistencyrequirementissatised.Theconsistencyconsiderationsaresummarizedasfollows,1)Ifaframeisassertedastheboundaryofashot,normallyaclearcutisinsertedbeforethenextframe.Theinstantaneouscontentswitchingintheoriginalvideoisprevalentlyusedanditlooksnaturaltoviewerstosymbolizethestartofanewepisode.Hence,novisualconsistencyconcernsarerequiredintheretargetedvideoeither.Forthesamereason,ashotcanbeprocessedindependentlyastheretargetingunit.2)Whenaframeiswithinashot,anyabruptRWchangeovertwoadjacentframeswouldresultinaverynoticeablejitteragainstthesimilarvisualcontentsamongadjacentframes.However,forthesesubshotboundaryframes,wemayonlyconsidertheVIpreservationbecausetheconsistencyrequirementpertainingtothemwillbesolvedbytheirtwoimmediateneighborsin3).Thisstrategyactuallykeepsanintermittentbuttimelyupdateofwheresaliententitiesintheoriginalframego,renderingaretargetedvideothatisverylikelytocontaintheinterestingparts.Thisupdatemechanismavoidstheclampingandpreventsthepossibleinaccurateretargetingduetosaliencyinaccuracyfrompropagatingtothenextupdatedframe.3)Whenaframebelongstotheinner-subshotframes,thetwocompetitiveobjectivesshouldbejointlyconsidered.Inordertoobtainavisual-info-keepingbutalsosmoothtraceofRWslinkingtheoptimallocationsofthetwosubshotboundaryframes,westacktheinner-subshotframestogetherasavolumeandminimizethetotalvisual-informationlossofthevolumeundertheconstraintthatanyRWtransitsoveradjacentframeswithcurvaturelessthanabound. 27

PAGE 28

Inaddition,wexedoneuniedscaleofRWsfortheentireshotprovidedthathumanperceptionissusceptibletoscalevariationofRWsovertheframeswithsimilarcontents,knownastheickerartifact.Asforthelocations,onthecontrary,certainamountoftranslationistolerablebyhumanperception.Actually,itisthetranslationofneighboringRWsthatmakethetrackingofinterestingcontentpossible.Therealchallengeishowtoconguretheallowedtranslationcarefullytosmoothlycatchupwiththemovementoftheinterestingregionoftheoriginalvideo. 2.4VisualInformationLossAstheaforementionedfact,ourretargetingsystemaimsatthebestpossiblepreservationofvisualinterestingnessandalsopermitstheretargetingconsistency.Thissectiondiscussesthemeasureoftheformerandthemathematicalquanticationofthevisual-infolosscausedbyaretargetingwindowwithcandidateparameters.ThemeasureisimplementedintheVisual-InfoAnalysismoduleinthesystem.Thevisual-infolosscloselyrelatestotheattentionmodelinSection 2.5 asthepixelsmanipulatedbytheRWsheddistinctinterestingness.Itcomesfromtwosources:1)thecroppingloss:theretargetingsystemremovestheperipheryareasoutsidetheRWsothatviewerscannotattendthecontentstherein.2)thescalingloss:thecroppedcontentisfurtherdownsampledwithscalestobeexactlyintheretargetingsize,whichdegradestheoriginalframeintothecoarserresolution.Supposetheattentiondistribution(x,y)correspondingaframeisavailable,thecroppinglossisthenmeasuredastheaccumulatedinterestingnessofthosepixelsdiscarded. Lc=1)]TJ /F12 11.955 Tf 20.31 11.35 Td[(X(x,y)2W(x,y)(2)Here,(x,y)isnormalizedsaliencymapsuchthatP(x,y)(x,y)=1,Wistheretargetingwindow.Themeasureofscalinglossismorechallengingastheoriginalframecannotbecompareddirectlywiththeretargetedcounterpartduetotheirdistinctsizes.[ 16 ] 28

PAGE 29

comparestheoriginalframewiththeoneafterapplyingalow-passGaussianlter,whichpresumablysubstitutestherealretargetedframe.However,thisheuristicdoesnotexactlymatchtherealoperationintheretargetingschemedownsampling.Inourmeasure,wegenerateasynthesizedframewhichcontainsexactthesamevisualinformationastheretargetedframebutinthesamesizeoftheRWstill.Toguaranteethevisual-infoequivalencyofthesynthesizedframewiththeretargetedframe,weupsamplethesyntheticframebymappingeachpixelintheretargetedframerepetitivelytobethesuccessive1=s1pixelsinthesyntheticframe.When1=sisnotaninteger,bilinearinterpolationisadopted.ThescalinglossisthendenedasthesquaredifferencebetweenthesynthesizedframeandthecontentoforiginalframewithintheRW. Ls=X(x,y)2W(F(x,y))]TJ /F5 11.955 Tf 13.17 2.66 Td[(^F(x,y))2(2)where^F=upsizing(gdownsizing(F,s),s).Thevisualinformationlossisthusthecombinationofthetwosources, L(x,y)=(1)]TJ /F6 11.955 Tf 11.96 0 Td[()Lc(x,y)+Ls(x,y)(2)whereisafactortobalancetheimportanceofcontentcompletenessandresolution,adjustabletouser'spreference.Givena,wemayndtheoptimalRWparameter(^x,^y,^s)byminimizingthevisualinformationlossmeasure, P(^x,^y,^s)=argminx,y,sL(x,y,sWt,sHt)(2)whereWs,Wt,Hs,Htarethewidthandheightofthesourceandtargetframes,respectively.Notethesearchrangeof(x,y,s)isconstrainedby0xW)]TJ /F3 11.955 Tf 12.37 0 Td[(sWt,0yH)]TJ /F3 11.955 Tf 12.21 0 Td[(sHt.Therefore,forthesubshotboundaryframes,theRWisnalizedby 29

PAGE 30

^x,^y,^sandtheretargetedframeisgeneratedimmediatelybyzoomingtheRWoutby^stimes.NotethatthescaleofRWisxedwithinashot(seeSection 2.7 ).Therefore,forsubshotboundaryframes,weonlysearchalong(x,y)tominimizethevisualinformationlossmeasure. 2.5TheNon-linearFusionBasedAttentionModelingAttentionmodelingcomputesthemeaningfulsaliencydistributionfortheevaluationofhowmucharetargetingwindowpreservesthevisualinterestingness.Amongadiversityofmodelingmethods,Guoetal.[ 8 ]taketheadvantageoftheQuaternionFourierTransform[ 26 ]toextractsaliencyfromfrequencydomain,whichisaprincipledapproachfreeofthesensitivityoflaboriousparametertuning.Meanwhile,[ 8 ]provesthatitisinfactthephasespectrumofanimagethatcapturesitsmostsaliencyfeature.Hence,itispropertosubstitutetheentirespectrumresidue(Houetal.[ 2 ]showthespectrumresidueisanefcientwaytodetectsaliency)withonlythephasespectrumforthesuccinctpurpose.Inthisdissertation,weinheritthemeritofthePhaseQuaternionFourierTransform(PQFT)todetectspatialsaliencybutcomputetemporalsaliencyseparately.Weargue,itisnotjustiableasin[ 16 ]tomixthethreespatialimages(oneluminanceandtwochrominance)togetherwithonetemporal(motion)imageintothequaternionandderivethesaliencyfromtheinterweavedspatiotemporaldata.Sincehumanperceivespatialandtemporalstimulusbydistinctpsychophysicalmechanisms,treatingspatialandtemporalchannelsjointlywithauniedtransformwouldproduceasaliencymodelthattwiststheactualspatiotemporalinteraction.Instead,werstsimulatethespatialsaliencywithPQFTandthetemporalsaliencywiththelocalmotionintensity.Thenwefusethemnonlinearlytomimicthehumanresponsestovariousspatialandtemporalsaliencydistributionscenarios. 30

PAGE 31

2.5.1SpatialSaliencywithPhaseQuaternionFourierTransformSpatialsaliency,toalargeamount,originatesfromthestimulusofthosepixelswithhighcontrastinluminanceandchrominanceagainsttheirneighborhoods.ThehighcontrastsarelargelymathematicallycapturedbythephasespectrumwhileQuaternionFouriertransformprovidesaniceprincipledapproachtocalculatethephasespectrumofacolorimage.InYCbCrcolorspace,avideoframeI(x,y,t)isrepresentedasthreeindependentscalarimages,Y(x,y,t),Cb(x,y,t)andCr(x,y,t),wherex,yarethelocationofadiscretepixelontheframe,tistheidnumberoftheframeintemporalorder,YistheluminancecomponentandCbandCraretwochrominancecomponents.Quaternion,generallyahypercomplexnumber,isq=a+b1+c2+d3,where,a,b,c,darenumbersinrealvalue,and i2=)]TJ /F5 11.955 Tf 9.3 0 Td[(1,i?j,i6=j,12=3(2)Reorganizeqinthesymplecticformasthecombinationoftwocomplexnumbers, q=f1+f22,f1=a+b1,f2=c+d1(2)ThenthequaternionFourierTransformcanbeperformedbytwostandardfastFouriertransforms(F1andF2),where, Q(u,v,t)=F1(u,v,t)+F2(u,v,t)2(2)whereFi(u,v,t)istheFourierTransformoffi(x,y,t).Assumea=0,andsubstituteb,c,dwithY,Cb,Crrespectively,wemayrepresentthevideoframeI(x,y,t)asapurequaternionframeq(x,y,t), q(x,y,t)=Y(x,y,t)1+Cb(x,y,t)2+Cr(x,y,t)3(2) 31

PAGE 32

ThenapplyEq.( 2 )tocalculatetheQuaternionTransformQI(u,v,t)ofthequaternionframeq(x,y,t)andthenitsphasespectrumPI(u,v,t)isderivedbydividingQIoveritsnormjQIj,PI=QI=jQIj.Finally,wetaketheinversequaternionFouriertransformforPI(u,v,t)togetthephasequaternionimageqp(x,y,t).Finallythespatialsaliencys(x,y,t)isobtainedbysmoothingoutthesquaredL2normofqp(x,y,t)withatwo-dimensionalGaussiansmoothinglterg. s=gjjqp(x,y,t)jj2 (2) 2.5.2TemporalSaliencywithLocalMotionIntensityGenerally,thedisparityoftemporallyadjacentpixelscomesfromboththecameramotionandlocalobjectmotion.Whilecameramotionappliestoeverypixelontheimageglobally,thelocalmotionembodiestheactualcontrastofapixelagainstitsneighborhood.Hence,thelocalmotionreectsthetemporalsaliency.WerstuseKanade-Lucas-Tomasi(KLT)trackertoobtainasetofmatchedgoodfeaturepoints[ 27 ]ofonevideoframeanditsneighboringframeandthenestimatetheglobalmotionparameters[ 28 ]withafnemodel.Thelocationmotionisextractedafterremovingtheglobalmotionfromtemporaldisparity.DenotethedisparityofaKLTfeaturepoint(xt)]TJ /F4 7.97 Tf 6.58 0 Td[(1,yt)]TJ /F4 7.97 Tf 6.59 0 Td[(1)atpreviousframeIt)]TJ /F4 7.97 Tf 6.59 0 Td[(1matchedwithx=(xt,yt)incurrentframeItasd=(dx,dy)T.Thenthedisparitycanbeapproximatedbyasix-parameterafneglobalmotionmodeld=Dx+t,wheretisthetranslationcomponentt=(tx,ty)TandDisa22rotationmatrix.Representtheafnemodelinformofthematchedfeaturepoints,xt)]TJ /F4 7.97 Tf 6.59 0 Td[(1=Axt+t,whereA=E+DandEisa22identitymatrix.ThemotionparametersintandDcanbeestimatedbyminimizingthetotalneighborhooddissimilarityofallthematchedfeatures. 32

PAGE 33

Figure2-4. Comparisonoflinearcombination,naiveMAXoperationandproposedapproachwhenglobalmotioniscorrectorwrong.Photoscourtesyofthevideotracelibrary[ 3 ]forYUVvideosequences. f^A,^tg=argminA,tZW(It(Ax+t))]TJ /F3 11.955 Tf 11.96 0 Td[(It)]TJ /F4 7.97 Tf 6.59 0 Td[(1(x))2dx (2) whereWdenotesa88neighborhoodoffeaturepoints.WeadopttheLeastMedianSquaresapproachtoestimatetheafneparametersrobustly[ 28 ].WegeneratetheglobalmotionpredictedframebywarpingthecurrentframeIt(x,y,t)withtheestimatedparameter^Aand^t.Theabsolutedifference(aftersmoothing)ofthepredictedframewiththepreviousframeIt)]TJ /F4 7.97 Tf 6.58 0 Td[(1(x,y,t)reectstheintensityoflocalmotionorthetemporalsaliency, m=g(x)jIt)]TJ /F4 7.97 Tf 6.59 0 Td[(1(x))]TJ /F3 11.955 Tf 11.96 0 Td[(It(^A)]TJ /F4 7.97 Tf 6.59 0 Td[(1[x)]TJ /F5 11.955 Tf 10.8 2.45 Td[(^t])j (2) 2.5.3NonlinearFusionofSpatialandTemporalSaliencyTheactualstrategiesadoptedbyhumanattentionwhenfusingspatialandtemporalsaliencycomponentsarerathercomplex,dependingonparticulardistributionsofspatialandtemporalsalientareas.Itisnoticeablethathumansarelikelytoattendvideo 33

PAGE 34

framesonpixelclusters,whichmaysuggesttheexistenceofameaningfulentity,ratherthanattractedtoasolitarypixel.Inahugevarietyofvideos,saliententitiesalmostuniversallyexpresshighspatialsaliencyvaluesincemostvideosarecaptured,byeitherprofessionalsoramateurs,topromoteacertainforegroundtarget.Forexample,whendirectorshootsaTVshow,theactororactressatthecenteroffocusisoftendepictedwithuniquecharacteristicstodistinguishfromthebackgroundorothers.Furthermore,anentityalsodemonstrateshighmotionsaliencyifithappenstomove.However,sincetheentitymayalsonotmoveinanumberofscenarios,motionsaliencydoesnotcarryasdiscriminativepowerasspatialsaliency.Hence,aimingatafusedsaliencymapfeaturingthehighcondenceandstabilitytosuggesttheexistenceofanentity,wemakethespatialsaliencyastheprimarycueandthemotionsaliencyassecondary.Ontheotherhand,weobservethatthedetectedspatiallysalientarea(obtainedbythresholdingthespatialsaliencymap)isgenerallycontinuousandconcentratedsincethesmoothingprocedureinEq.( 2 )issuitedforthedetectionofhighspatialcontrastregion,ratherthananindividualpixel.Thistraitnicelystandsouttheexistenceofanunderlyingsaliententity.Inordertofurtherenhancetheentityfromotherspatialsalientareas,weresorttothecorrelationofthemotionsaliencymapandincreasethesaliencyvalueofthespatiallysalientareasbytheamountproportionaltothecorrespondingmotionsaliencyvalue.Thereasonwhythismeasureworksisthatboththespatialandmotionsaliencyaredrivenbythesameentities.Hencetheirintersectionareasuggeststheprobableexistenceofrealsaliententity.Forexample,thebeeinFig. 2-4 representsasaliententitythatviewersusuallyattend.Asindicatedinthespatialsaliencymap,thehighsaliencyareaindeedcoverswherethebeelocatesbutalsoadmitsotherareasthatalsoconsideredspatiallysalient.Inthiscase,thecalculatedmotionsaliencymapconcentratesandsuggestswherethebeeis.Sothemotionsaliencyisabletoimprovetheconspicuousnessofthebeebyincreasingthesaliencyvalueofthebeeareaofthespatialsaliencymap. 34

PAGE 35

However,sincemotionsaliencyisperformedonpixellevel,thedetectedsalientareacanberatherdispersedasthepixelsinscatteredlocationsmaycontributecompetitivelocalmotionssothatnoneofthemcandistinguish.Inthiscase,themotionsaliencymapcannotsuggestreliableanddominantsaliententities,althoughtherelatedpixelsareindeedwithhighlocalmotion.Fortunately,thespatialsaliencymaparenotaffectedandthuscanbeutilizedasaltertoconnetheexistenceofthesaliententitywithinthespatiallysaliencyareaandthepixelsinsuchareasarestillallowedtobeenhancedbytheirmotionsaliencyvalue.Forexample,inFig. 2-4 ,themotionsalientareasspreadovertheentireimage,capturingnoneofthetwoanchors(thesaliententities).However,thespatialsaliencymapsuccessfullyconcentratesonthetwoanchors.Therefore,thenalsaliencymapmayusethespatialsalientareatolterthemotionsaliencyandonlyincreasethesaliencyofpixelscoveringtheanchors.Basedontheanalysisabove,wedevisethefollowingnonlinearfusionscheme, (x,y,t)=maxfs(x,y,t),M\m(x,y,t)g (2) M=f(x,y,t):s(x,y,t)"gTheintersectionofMandmotionsaliencymapsimulatesthelteringofspatialsalientareasovertemporalsaliency.Withintheintersection,themaxoperatorusemotionsaliencyvaluetointensifytheareaswherebothtwocuesagree.Thisoperatorutilizesthecharacteristicsoftwosaliencymapsandproducesasaliencymapthatemphasizesonsaliententity. 2.6JointConsiderationsofRetargetingConsistencyandInterestingnessPreservationThissectiondiscussesthemoregeneralcase:theretargetingoftheinner-subshotframes.Herewenotonlyseektoavoidthevisualinformationlossbutalsoensuretheretargetingconsistencyoveradjacentframes.Inessence,thetwoobjectivesare 35

PAGE 36

conictingtrade-offs.Ononehand,ifwendRWsbymerelyminimizingintra-framevisuallossforeveryframe,theresultantRWindeedalwaystracksthemostupdatedinformation-richareas.However,thoseareasdonotnecessarilymovewithconsistentpatternsandsowiththesearchedRWs.Eventuallywemayendupwithavideocontaminatedwithannoyingjitters.Ontheotherhand,ifweonlydesiretheabsoluteretargetingconsistency,thepositionsofaRWduringtheentiresubshotshouldbexedasotherwiseanynon-staticRWwouldintroduceanarticialglobalmotionintotheretargetedvideo.Nevertheless,thestaticRWinthissituationisunabletotrackthedynamicvisualinformationrichareasandpreservethem.Keepinmindthattopermitretargetingconsistency,itisimpossiblefortheRWofeachframeindividuallytoattainitsbestpositionwithlocalminimalvisual-infoloss.Hence,thejointconsiderationofthetwoobjectivesrequiresustotreattheentireinner-subshotframestogether.WethusproposethevolumeretargetingcostmetricinEq.( 2 )toevaluatetheretargetingofawholesubshot, Lv=NXt=1L(xt,yt)+!NXt=1D(xt,yt,xt)]TJ /F4 7.97 Tf 6.58 0 Td[(1,yt)]TJ /F4 7.97 Tf 6.59 0 Td[(1)(2)whereD(xt,yt,xt)]TJ /F4 7.97 Tf 6.59 0 Td[(1,yt)]TJ /F4 7.97 Tf 6.59 0 Td[(1)=jxt)]TJ /F3 11.955 Tf 12.91 0 Td[(xt)]TJ /F4 7.97 Tf 6.59 0 Td[(1,yt)]TJ /F3 11.955 Tf 12.9 0 Td[(yt)]TJ /F4 7.97 Tf 6.58 0 Td[(1j2isthedifferentialoftheRWtraceattheframet,measuringtheretargetinginconsistencytherein.ListhevisualinterestingnesslosswiththesameinterpretationasinEq.( 2 ).!isthetrade-offfactorofthetwoobjectivesandNisthetotalnumberofinnerframesinasubshot.Thevolumeretargetingcostmetricfeaturesthetotalvisual-infolossplustotalretargetinginconsistencyandemphasizessearchingforadynamicRWtracefortheentiresubshot.Whenthosetemporallyadjacentframesarestackedtogether,theminimizationofthevolumemetricexploresacongurationoftheRWpositionswithlowtotalcost,forcingtheinformationlossandretargetinginconsistencyregardingeachindividualframelowaswell.Thismetricisarelaxationofindividualvisualinterestingnesspreservationwhenmingledwiththeretargetingconsistencyconcern. 36

PAGE 37

Figure2-5. Eachframecorrespondstoalayer.Green:sourceanddestinationvertexes.Yellow:candidatevertexforeachframe.Red:thepathwithleastcosttodenoteoptimizeddynamictrace.PhotoscourtesyofthemovieMadagascarbyDreamWorksAnimation[ 29 ]. Furthermore,inordertoguaranteetheretargetingconsistency,weexplicitlyaddaconstraintthatthenormofthedifferentialoftheRWtraceateachframeshouldbelessthanavalue.Therefore,thesearchingforthebesttraceofRWsisformulatedasthefollowingoptimizationproblem, f^xt,^ytgNt=1=argminfxt,ytgNt=1Lv(xt,yt) (2) s.t.D(xt,yt,xt)]TJ /F4 7.97 Tf 6.59 0 Td[(1,yt)]TJ /F4 7.97 Tf 6.59 0 Td[(1)"where"isapsychophysicalthresholdbelowwhichhumanattentioncantolerateviewinconsistency. 2.6.1GraphRepresentationoftheOptimizationProblemThesolutiontotheoptimizationprobleminEq.( 2 )isnottrivialduetotwoaspects.1)theargumentsintheoptimizationarethetraceoftheRWs;theyarethesequentialpositionsofRWsinhighdimension.2)Theobjectivefunctionmaybenon-linearandevennon-convexduetothenatureofthecomputedsaliencydistribution.Thus,regularanalyticalorcomputationmethodsmaynotworkinthissituation;however,weobserveitismeaningfultorepresenttheoptimizationinagraphandexplorethesolutioninthatcontext. 37

PAGE 38

AsdepictedinFig. 2-5 ,weconstructagraphwithvertexesacrossNlayers.Eachlayersymbolizesaninnerframefrom1tonwithinasubshotandeachvertexononelayerrepresentsapossiblepositionoftheRWtobedecided.WeassignacostvalueforeachvertexasthevisualinformationlossincurredbytheRWatthecorrespondingposition.Thenforeverypairofvertexesontheadjacentlayers,ifthenormofdifferentiatewiththemislessthanthebound",weestablishanedgetolinkthemtogether,suggestingapossibletransitionoftheRWoveradjacentframes.Thisestablishmentensurestheretargetingconsistencyconstraintissatised.Thenwealsoassigneachedgeacostvalueasthenormofthedifferentiatewiththetwovertexesonitstwoends.Specically,thesourceanddestinationvertexesarethepositionsoftheRWsonthetwoboundaryframesofthesubshot,whichareobtainedbyminimizingvisualinformationlossonly.Inthisgraph,anypathfromsourcetodestinationvertexeswouldbeapossibletraceoftheRWsoverthesubshot.Wedenethecostofapathasthetotalcostofthevertexesandedgesonit.Evidently,thecostofapathdenotesthevolumeretargetingcostmetricinEq.( 2 )andthesolutiontotheconstrainedoptimizationproblemisthepathwiththeminimalcost. 2.6.2TheDynamicProgrammingSolutionWeproposeadynamicprogrammingmethodtondthepathwithminimalcost.Supposetheoptimalpathfromthesourcevertexstothejthvertexvjiontheithlayer,iss!vji,thequestionishowtondouttheoptimalpathtotheallthevertexesonthenextlayer.Forthekthvertexvki+1onlayeri+1,denotethesetofvertexesonlayerithathasanedgetolinkvki+1asV.Thentheoptiontondthebestpathtovki+1istoaugmenttheoptimalpathsuptoeveryvertexinVtovki+1andchoosetheonewithminimalupdatedcost.Therefore,therecursiveformatoftheobjectivefunctioninEq.( 2 )isasfollows, 38

PAGE 39

Lv(s!vki+1)=minvji2VfLv(s!vji)+!D(vji,vki+1)g+L(vki+1) (2) wheres=v1andLv(s!v1)=0.Lv(s!vji)denotesminimizedvolumeretargetingcostuptoframeiassumingvjiisthedestinationorequivalentlytheshortestpathfromsourcevertexsofframe1tothejthvertexofframei.Lv(s!vki+1)istheshortestpathuptothekthvertexofframei+1,D(vji,vki+1)denotesthecostofedgeconnectingthejthvertexofframeitothekthvertexofframei+1andL(vji,vki+1)isthecostofthekthvertexofframei+1.NoticethatboththesourcesanddestinationdarepredeterminedbyminimizingEq.( 2 )forboundaryframes.Startingfroms,weupdatethebestpathuptothevertexesoneachlayerfrom1toN.Intheend,forallthevertexesonlayerN,wendouttheonethroughwhichthebestpathleadingtod, vkN=argminvjNLv(s!vjN)+!D(vjN,d)(2)ThenthebestpathleadingtovkNisthenalsolutiontotheoptimizationinEq.( 2 ). 2.7OptimalSelectionofScaleinaShotAsmentionedbefore,auniedscaleischosenfortheentireshot.Itdeterminestheactualsizeoftheretargetingwindowandreectstheaestheticpreferenceofaparticularviewer.IntheminimizationoftheinformationlossfunctioninEq.( 2 ),thechosenscaledependsonthecropping-scalingtrade-offfactor.Ourproposedsystemallowsviewerstoinitializeit,givingthesystemageneralideaofgeneratingalocalorglobalretargetingvieworsomewhereinbetween.Basedontheinitial,wendtheoptimalscalebymakingthepartialderivativeofthevisual-infolossfunctionwithregardtothescaleequaltozero.Weperformthis 39

PAGE 40

operationoneachframesintheshotandaveragetheobtainedscalesastheuniedscaleoftheshot.ThesizeoftheRWdenitelyaffectshowquicklytheRWrespondstothedynamiccontentchange.ItisenlightenedtothinkthesizeofaRWasgearsofthemanualtransmissioninacarandthetransitionrateofRWsbetweenadjacentframesasthetachometerandthechangeofdynamiccontentastheroadcondition.Wedesiretomaintainthetransitionratewithinapleasinglevelforconsistencyconcerns.Meanwhile,thefeaturedcontentsaresupposedtobetrackedandpreservednomatterhowquicklyitchanges.Justaschoosingahighergearforahighspeedifroadconditionpermits,whenthesaliententitiesmoverapidly,itissensibletochoosealargerRWsizetosatisfytheaforementionedtwodesires.Onthecontrary,whenthesaliententitiesmoveslow,wemayaltertoasmallerRWtosavetheresolution.Therefore,givetheinitialwhichsettlestheaestheticpreference,wetuneitafterwardstobettersuitforthecontentchange.WeusethevelocitytheRWtransits(predictedbytheinitial)toestimatehowfastsaliententitiesmove.Thenbasedonthevelocityestimate,weadjusttheweightinordertoobtainamoresuitablescale,whichthenresizetheRWtotracksaliententitiesmorewisely. 0= 1+e)]TJ /F4 7.97 Tf 6.59 0 Td[((1 NPNi=1(vi)]TJ /F15 5.978 Tf 5.76 0 Td[(vi)]TJ /F16 5.978 Tf 5.76 0 Td[(1)2 2i)]TJ /F10 7.97 Tf 6.58 0 Td[(v) (2) where1 NPNi=1(vi)]TJ /F10 7.97 Tf 6.59 0 Td[(vi)]TJ /F16 5.978 Tf 5.75 0 Td[(1)2 2iisthevelocityestimateofthemovingRW,Nisthetotalnumberofframesintheshot.iisthemaximumdistanceaRWcanmovefromvi)]TJ /F4 7.97 Tf 6.58 0 Td[(1atframei)]TJ /F5 11.955 Tf 12.4 0 Td[(1.vdenotesareferencevelocitywhichhumanndspleasing.Giventheupdatedweight0,anewoptimalscaleaverageiscalculatedfortheshot. 40

PAGE 41

2.8ExperimentalResultsWedesignthreegroupsofexperiments1)spatialsaliencymodeling2)videosaliencymodelingand3)videoretargetingtodemonstratetheeffectivenessandefciencyoftheproposedattentionmodelingmethodandtheproposedretargetingsystem.Forevaluatingconvenience,thetestingvideosequencesareclipsin1to2minuteslength.Theycovermanygenres,havemultiplescenesandcomplexbackgroundsandsufcientlydemonstratethevalidnessofourretargetingsystem.Also,inordertotesttheperformanceofourmethodonlongvideos,weretargetedtwo10minutemoviesthebigbuckbunnyandelephantdreamtovariantresolutionsandaspectratios.TheyarefullmoviesfromOpenMovieprojectswithoutlicenseissue.Inallthreeexperiments,wecompareourschemeswiththerepresentativeexistingapproaches.OurschemesareimplementedinC++andMatlabwithopensourceslibrariesOpenCV( http://opencv.willowgarage.com/wiki/ ),FFTW3( http://www.fftw.org/ )andKLT( http://www.ces.clemson.edu/~stb/klt/ ).Sincetheattentionmodelingandtheentirevideoretargetingarehighlysubjectivetasks,wesummarizetheexperimentalresultsintwofashions,includingtheimagesnapshotsandonlinevideostoprovidereadersthevisualcomprehensionoftheproposedschemesandalsothesubjectivetestsoftheviewerperceptionscores. 2.8.1SpatialSaliencyModeling 2.8.1.1Proto-regiondetectionresultsonspatialsaliencymapFig. 2-6 showstheattentionmodelingresults(saliencymaps)generatedbyhuman,saliencytoolbox(STB, http://www.saliencytoolbox.net/ ),context-awaresaliency(CS)[ 30 ],Histogram-basedContrast(HC)[ 31 ],Region-basedContrast(RC)[ 31 ]andourattentionmodelingalgorithmrespectively.Weuseacollectionofimagesinmultipleresolutionsandaspectratiosfortheexperiments.Besidetheresultantsaliencymaps,wealsoillustratethesocalledproto-regions[ 32 ],whicharefoundbythresholdingthe 41

PAGE 42

normalizedsaliencymapswiththesamethreshold[ 2 ],toshowthecontoursofsaliententitiesincontents,asshowninthered-circledregionsonoriginalimagesinFig. 2-6 .Notethatinrigidvideoretargeting,itistheshapesofthedetectedsalientobjectsandwhethertheyareaccuratelydistinguishedfromthebackgroundthatmattermostbecausethesalientobjects,asintegratedentities,areintendedtobekeptintheretargetingwindow.Agoodsaliencymapheredoesnotonlycaptureallthepixelsofthesalientobjects,butalsoavoidstocapturetoomuchdetailsfromthebackground,whichweakenstheemphasisofsaliencyobjects.Therefore,weusethesimilarityoftheproto-regionswiththeground-truthattendedobjects(generatedbyusasviewersinCol.2ofFig. 2-6 )tomeasuretheeffectivenessofeachmethod.ComparingthesaliencymapsofSTB,ouralgorithmsuccessfullydetectsmorepixelsonthesalientobjectsandthusourproto-regionshavemuchmoresimilarityoftheshapewiththeground-truth.Forexample,forimage4whichshowstwochildrenplayingatbeachwithasailingboatinthesea,ouralgorithmsuccessfullyextractstheregionsofchildrenandtheboatassalientobjectswhileSTBisonlyabletocapturethelinebetweentheseaandthesky.AsforHCandRC,theirmethodsproduceasaliencymapthatemphasizesregions,whichissuitableforimagesegmentation(e.g.image1andimage7ofFig.9).However,weobserveoccasionallytheobjectsofinterest,althoughwhollyextractedasregions,arenotverysalientcomparingwithotherregions.Forexample,forHC,thehouseregioninimage3isconsideredlesssalientthentheowerregioninboththetwomethodsandthesaliencyoftheoldlady'sfaceisovershadowedbythewallnearby;forRC,thechildreninimage4arelesssalientthantheseasurface.ForthecomparisonwithCS,bothCSandouralgorithmdetectallthepixelsofthesalientobjects.Indeed,CScapturesmoredetailsinboththesalientobjectsandthebackground,whichpotentiallycounterbalancestheimportanceofthesalientobjects.Inimage2,CSresultsshowboththehouseandthegroundarecaptured,resultinginaproto-regionthatincludesmanyunnecessarybackgroundregions. 42

PAGE 43

Figure2-6. Col.1:originalimage.Col.2:humanlabeledsalientregions.Col.3:proto-regionsdetectedbySTB.Col.4:saliencymapbySTB.Col.5:proto-regionsdetectedbyCS.Col.6:saliencymapbyCS.Col.7:proto-regionsbyHCCol.8:saliencymapbyHC.Col.9:proto-regionsbyRC.Col.10:proto-regionsbyRC.Col.11:proto-regionsdetectedbyourmethod.Col.12:saliencymapofourmethod.PhotoscourtesyoftheimagedatabasebyXiaodiHou[ 2 ]. 2.8.1.2SubjectivetestoftheattentionmodelingWeconductasubjectivetesttocollectthescoresranging1to5gradedby60participants( http://www.mcn.ece.ufl.edu/public/ZhengYuan/spatial_saliency_comparison.html ).Thenweconductcondenceintervalanalysistocalculatewherethemeanscoresofeachmethodliesinwitha95%probability.TheresultsofcondenceintervalanalysisaresummarizedinTable 2-2 andFig. 2-7 isthebarchart.Itisshownthatforallimagesexceptimage7andimage9,theyexhibitaconsistencythatthecondenceintervalsofSTBdonotoverlapwithours.SinceourmethodhasahighermeanscorethanSTB,itsuggeststhatoursmethodoutperformsSTB.ComparingwithHCandRC,exceptforimage2and 43

PAGE 44

Figure2-7. StatisticsforsaliencymapsbySTB,CS,HC,RCandOurmethod. image4,thecondenceintervalsofourmethodoverlapwiththetwomethodsbutgenerallyoccupyslightlyhigherrangesforotherimages.ThepossiblereasonisthatthesaliencyoftheobjectsofinterestsbyHCandRCmaybedilutedbythecolorvariationsnearby.ForthecomparisonwithCS,itseemsthatformostimages,thecondenceintervalsoverlap.Generally,intermsofthegradingonthesimilarityofdetectedsalientobjectswiththegroundtruth,theperformancesofoursandtheCSmethodaresimilar.Sometimes,thetwomethodsstilldemonstratesapotentialofperformancegap.Inimage5,thecondenceintervalofCSishigherthanours.Howeverinimage4,ourcondenceintervalishigher.ThepossiblereasonisthatinthesaliencymapbyCS,toomuchdetailsonthetwochildrenareextracted.Thusthesamethresholdlevelwhichperfectlydistinguishestheshapeofthesailingboatresultsintwooversizedchildren.Thisweakenedsalientobjectduetoover-extracteddetailsalsoexplainsthescorecomparisononimage8. 2.8.1.3ComputationalcomplexityWetestthetimecomplexityofthevemethodsontheoperatingsystemasWindows7,64bitsandhardwareasCPU2.67GHzandmemory4GB.ThetimecomplexityisalsoinTable 2-2 .Indicatedbythecomputationtime,ourmethodisfasterthanalltheotherfourmethods,especiallyourcomputationalefciencyis1000timesofthatofSC,thusissuitableforrealtimevideoretargetingsystem. 44

PAGE 45

Figure2-8. Therstandfourthrows:originalvideosequencesRatandSunower.Thesecondandfthrows:thesaliencymodelingresultsofbaseline-PQFT.Thethirdandsixthrows:thesaliencymodelingresultsofourmethod.Photoscourtesyofthevideotracelibrary[ 3 ]. 2.8.2AttentionModelingComparisoninVideoRetargeting 2.8.2.1SaliencyvideocomparisonHerewefocusontheattentionmodelingunitandcomparetheproposednon-linearfusemodelingwiththebaseline-PQFT[ 8 ].Wepresentthesaliencycomparisonsonsixvideos,Chicken,Rat,News,Sink,JetsandSunower.Thecomparisonsareinformoflivevideoswithbaseline-PQFTsidebyside.Pleasevisit http://www.mcn.ece.ufl.edu/public/ZhengYuan/saliency_comparison.html towatchthem.Duetothespacelimitation,Fig. 2-8 presentsthesnapshotsoftworepresentativesaliencyvideoresults(ratandsunower).AsboththeonlinevideosandFig. 2-8 show,thesalientobjectsinRatdetectedbyourmethodhasmoreresemblingshapeandgesturewiththeRat(thesalientobjectinrat)thanthatofBaseline-PQFT.ThesamecasehappensinthebeeinSunower.Thereasonisthatinournon-linearfusionmodeling,thefusionoftwochannelsemphasizesthepotentialexistenceofsalientobjectsratherthanindividualsalientpixel,thusresultsinbettershapeofthedetectedsalientobjects.Baseline-PQFTfeedsthespatialandmotiondatadirectlyintothePQFTcalculation.Sincethevaluesaredifferentinnature,theshapeofdetectsalientobjectsmaybetwisted. 2.8.2.2SubjectivetestWealsocarryoutthesubjectivetestforthesaliencymodelingcomparison.Likethespatialsaliencyevaluation,wecollectthescoresfrom60participantsandperformthe 45

PAGE 46

Figure2-9. Green:saliencyBaseline-PQFT.Purple:ours. condenceintervalanalysisofthemeanscoresforthetwosaliencymodelingmethods.Table. 2-3 presentstheresultsoftheestimatedintervalwherethetwomeanscoreslieinforeachvideoandFig. 2-9 showsthebarchart.FromFig. 2-9 ,wecanseeforChicken,Rat,NewsandSunower,theintervalbyourmethoddoesnotoverlapwiththatofBaseline-PQFTandourintervalishigher.ForJetsandSink,althoughourcondenceintervalshaveasmalloverlapwithBaseline-PQFT,theystilloccupyahigherrange.ItsuggeststhatparticipantsgenerallyconsideroursaliencymodelingisbetterthanBaseline-PQFTintermsofsalientobjectdetection. 2.8.3TheComparisonofVideoRetargetingApproaches 2.8.3.1VideoandimagesnapshotcomparisonWepresentvideotargetingcomparisonon6videos,Barnyard,FashionShow,HearMe,Madagascar,RatandSoccer.Theyarein1to2minuteslengthwithmultiplescenechangesandcomplexbackgrounds.Weperformvideoretargetingwithourretargetingsystemandbytwoprevioushomogeneousmethods:SingleFrameSmoothing(SFS)[ 15 ][ 16 ]andBackTracing(BT)[ 17 ].Eachoriginalvideohasanaspectratiobetween1 2and3 4withwidthmorethan600pixels.Theretargetedoutputsizesare320240and480240,sointheretargetingprocess,thevideosarebothsqueezedandstretched.Wedemonstratethevideocomparisonresultson http://www.mcn.ece.ufl.edu/public/ZhengYuan/video_retargeting_comparison.html .Fromallthevideos,wemay 46

PAGE 47

generallyobservethatSFSsuffersfromjittering,whichcausesuncomfortablefeelingsofviewers.BackTracingismostlyacceptable,however,theretargetedvideoisnotalwaysabletopreserveregionsofinterestoftheoriginalvideo.Incomparison,ourmethodthroughoutpreservessalientregionasaframegoesfurtherandavoidsjittereffectsaswell2.Duetothespacelimitation,Fig. 2-10 capturesthesnapshotsoftworepresentativevideoresults,Madagascar(retargetto320240,aspectratiosqueezed)andtheFashionshow(retargetto480240,aspectratiostretched).Fortheauthenticperformancecomparison,readersmaystillrefertoourwebsiteforobservation.IntheresultsofSFS,althoughthelionandthezebraarepreservedcompletely,theretargetingwindowshiftsbackandforthfrequently,whichsuggestshugejittereffectsintheretargetedvideo.RegardingBT,fromthe1sttothe3rdframes,theretargetingwindowincludescompletezebra,however,asframegoestothe4thandthe5th,itisleftbehindbythezebraduetothefastmotionofthelatter.Somostpartsofzebraislostintheretargetedvideo.Incontrast,ourresultyieldsavisuallyconsistentretargetingwindowtracetopreservezebracompletely.Forthefashionshow,BTresultsinaretargetingwindowthatdoesnotincludethemodel'sfaceasshemovesacrossframes;Incomparison,forthecorrespondingframes,ourresultsalleviatetheconsistencyclampingartifactandmostlykeepthemodel'sfaceinside.ForSFSretargeting,itstillhasshakyartifact,viewersmayperceivethatthroughonlinevideoresults. 2.8.3.2SubjectivetestSinceitisobviousfromtheretargetedvideosthatbothBTandourapproacharebetterthanSFS,weonlyneedtoevaluateBTandourapproachquantitively.Foreach 2Occasionallyitmayincludemodestunnaturalcameramotion,whichcanbealleviatedbyincreasing!inEq.( 2 ) 47

PAGE 48

Figure2-10. TwosnapshotsofvideoretargetingcomparisonamongSFS,BTandourapproach.Therstandfourthrows:singleframesearchandsmoothing.Thesecondandfthrows:backtracing.Thethirdandsixthrows:ourproposedapproach.PhotoscourtesyofthemovieMadagascarbyDreamWorksAnimation[ 29 ]andtherunwaymodelvideobyMichaelKorsatYouTube[ 33 ]. retargetingprocess,wecollectthesubjectivescoresranging1to5from60participants,asintheprevioustests.Basedonthecollectedscores,weperformcondenceintervalanalysistocomparetwomethodsaccordingtowheretheirtruemeanscoreslie.Table 2-4 summarizesthecondenceintervalsforeachvideobeingretargetedto320240and480240.ThebarchartinFig. 2-11 illustratestherelativelocationsofthemeanscores.FromTable 2-4 andFig. 2-11 ,itseemsforalltheretargetingprocesswheretheaspectratioisstretched,thecondenceintervalsbyourmethodarehigherthanthoseofBT,althoughsometimestheyoverlapwithBTwithasmallpercentage.Fortheretargetingprocesswheretheaspectratioissqueezed,thecondenceintervalsofourmethodindeedoverlapwith 48

PAGE 49

Figure2-11. Statisticalanalysisforvideoretargetingapproaches.Green:Backtracing.Purple:Ours. thoseofBT;however,itseemsthattheystilloccupyhigherranges.ThissubjectivetestresultssuggestthatourmethodisgenerallybetterthanBT,especiallyinthecasewheretheaspectratioisstretched. 49

PAGE 50

Table2-2. Condenceintervalanalysisforsubjectiveevaluationonspatialsaliencymodeling=0.05.CourtesyofimagedatabasebyXiaodiHou[ 2 ]. LBoundMeanUBoundTime Img1STB2.682.943.2197.0msCS3.343.593.83129608msHC3.493.693.905323msRC3.263.493.7335501msOurs3.243.483.7210.9ms Img2STB1.862.112.36103.2msCS3.393.663.9274972msHC2.342.582.822069msRC2.622.863.1025971msOurs3.253.523.7912.1ms Img3STB1.872.132.3995.7msCS3.363.643.92105808msHC3.193.423.661076msRC2.993.213.4420531msOurs3.343.573.8010.4ms Img4STB1.792.002.2079.8msCS3.083.323.5696148msHC2.612.953.29385msRC2.452.642.831831msOurs3.123.423.7114.0ms Img5STB2.652.913.17100.4msCS2.973.323.6637638msHC3.083.253.421445msRC2.813.083.367529msOurs2.853.183.5173.3ms Img6STB2.722.993.2678.8msCS3.483.844.20106181msHC3.353.694.04428msRC3.273.644.022215msOurs3.503.814.129.8ms Img7STB2.953.253.5498.3msCS3.293.533.76172269msHC3.263.473.685561msRC3.283.503.73220497msOurs3.323.563.7912.4ms Img8STB3.203.483.7594.6msCS3.463.713.95130905msHC3.313.563.82765msRC3.663.924.195170msOurs3.754.044.328.5ms Img9STB2.853.133.41100.0msCS3.213.493.7649442msHC3.383.643.915757msRC3.363.613.8744585msOurs3.253.533.7911.7ms 50

PAGE 51

Table2-3. Condenceintervalanalysisforsubjectiveevaluationonvideosaliencymodeling=0.05.Courtesyofthevideotracelibrary[ 3 ]. LBoundMeanUBound ChickenBaseline-PQFT2.542.933.06Ours3.283.654.02 RatBaseline-PQFT2.732.973.21Ours3.673.974.26 NewsBaseline-PQFT1.822.242.66Ours3.363.603.84 SinkBaseline-PQFT3.523.844.16Ours3.864.134.39 JetsBaseline-PQFT3.163.463.75Ours3.383.653.92 SunowerBaseline-PQFT2.352.642.93Ours3.924.244.56 Table2-4. Condenceintervalanalysisforvideoretargeting=0.05.Courtesyofthevideotracelibrary[ 3 ]. BTOurs VideosRetargetSizelBMuBlBMuB Barn320x2403.363.603.843.653.854.05480x2403.643.904.153.784.074.36 Fashion320x2403.253.533.803.583.854.12480x2403.173.473.763.924.224.51 Hearme320x2403.693.884.063.824.044.26480x2403.463.744.023.754.074.39 Madaga320x2403.313.643.973.523.864.20480x2403.553.824.083.834.134.42 Soccer320x2403.183.483.773.433.673.91480x2403.343.643.863.663.924.17 51

PAGE 52

CHAPTER3VIDEOSUMMARIZATION 3.1BackgroundThefundamentalpurposeofvideosummarizationistoepitomizealongvideointoasuccinctsynopsis,whichallowsviewerstoquicklygraspthegeneralideaoforiginalvideo.Theresultantsummaryprovidesacompactrepresentationoftheoriginalcontentstructure.Althoughbrief,agoodsummarypreservesallnecessaryhallmarksoftheoriginalvideoandviewersaresufcientlyabletorecoveroriginalcontentthroughreasoningandimagination.To-date,thereareexcellentsurveysonvideosummarization,abstractionandskimming[ 34 35 ].Thesepaperscovermanydetailedapproacheswithonecommonstrategy:beingformulatedasanoptimizationproblem,itselectsasubsetofvideounits(eitherstaticframesordynamicshotclips)fromallpossibleunitsintheoriginalvideosuchthattheymaximizesomemetricfunctionofthesummaryquality.Basedonthecognitivelevel(signalandsemantic)whereametricfunctionlies,wecategorizecurrentsummarizationtechniquesintothreetypes.TypeIutilizessignallevelfeaturestomeasurethedifferenceofavideosummaryfromitsoriginal.Variousimplementationsincludethemotiontrajectorycurve[ 36 ],visualredundancy[ 37 ],visualcentroid[ 38 ],inter-framemutualinformation[ 39 ],similaritygraph[ 40 ]andsummarizedPSNR[ 41 ].Allthesemetricsaremanipulationsofpureintensitiesandinessencemeasurethevisualdiversitycontainedinasummary.Hencethemaximizationleadstothesummarywithmostcontentdiversity,butnotnecessarilytheonethatpresentsmostimportantcluesthatenhanceviewers'understanding.TypeIIcharacterizeswithhighlevelsemanticanalysis,inwhichsemanticeventswithexplicitmeaningsaredetectedandtheresultantsemanticstructureisutilizedtoguidethesummarization.Generally,semanticsaredenedexplicitlybysomeofinedatabase,whichannotatesitsentrieswithmeaningfultags.Throughsupervisedlearningfromlabeleddata,variousmethodsinthiscategorydetectseventswithexplicit 52

PAGE 53

meanings.TypicalimplementationsincludetherecognitionoftheemotionaldialogueandviolentactionTypeIIIliesintheintermediatelevel,seekingentitieswithimplicitmeanings.Thephilosophyisthatimplicitsemanticalentitiesalsosufceviewerstounderstandandrecoveroriginalplotwhileavoidingtheheuristicattemptsforexplicitsemanticrecognition.Someresearchersin[ 42 45 ]assumethesemanticsareimplicitlyexpressedbypopularhumanperceptionmodels[ 46 47 ]andtheyyieldsummarieswithmostsalientvideounits.Unfortunately,salientfeaturesdonotnecessarilymeansemanticdistinguishableastheybasicallymeasurehowinterestingofavideowhiletheinterestingpartmaybeanimportantclueforunderstandingormaybenot.Inthisdissertation,weproposeanintermediatelevelapproachtoexploretheimplicitsemanticsoforiginalvideo.Wepursueaself-explanatoryvideosummarythroughdiscoveringandpreservingconcepts.Conceptssymbolizeabstractcluesonwhichviewersbasetorecovertheoriginalplotthread;althoughontheintermediatelevel,theyarepatternslearnedfromtheoriginalvideoandrepresentimplicitsemanticmeanings,ratherthaninanexplicitandrigorousmannerbutwithlessgeneralization.Themotivationofconceptisintuitive:emulatingthehumancognitiveprocess,naturallyalistofkeypatternedhintssuchascharacters,settings,actionsandtheirordersareneededintheshortsummaryforviewerstostitchthesehintslogicallyanduseimaginationtolltheomittedpart.Theconceptscorrespondinglyencodethesemanticmeaningsofpatternedhints.Specically,weextractvisualfeaturesandusespectralclusteringtodiscoverconceptsandconsidertherepetitionofshotsegmentswhichinstantiatethesameconceptassummarizationredundancy.Wefurtheranalyzethecriteriaofagoodsummaryandformulatethesummarizationasintegerprogrammingproblem.Arankingbasedsolutionisgiven.Themaincontributionsofthischapteraretherefore: 53

PAGE 54

Figure3-1. Thetoprow:shotinstances.Themiddlerow:concepts.Thebottomrow:anunderstandablesummarybypresentingconcepts.PhotoscourtesyoftheTVshowBigBangTheorybyCBSBroadcasting,Inc[ 4 ]. (i)Ain-depthanalysisonthephilosophyofvideosummarizationandansweredthekeyquestions:whatarethesemanticallyimportantelementsandredundances;andhowtokeepthesemanticsinthesummary.(ii)Proposedaclusteringbasedmethodtodiscoverthesemanticstructureofthevideoasconceptsandinstances.(iii)Proposedarankingbasedmethodforsummarygeneration,whichisascalableframeworkthatadaptsthedetailpreservationtothesummarylength. 3.2TheSummarizationPhilosophyInordertocapturethevideosemantics,therstquestionishowthesemanticsareexpressedinthevideo.Observingmanymeaningfulvideos,wendoutthemajorsemanticsembeddedareaplotoutlinethesevideostrytoexpress.Generally,theoutlinecanbeabstractedinasequenceofentitycombinations(e.g.inFig. 3.1 ,thefourpersons,asfourentities,arecombinedtogethertohaveaconversation).Eachentitycombinationformsaparticularconceptthatexhibitscertainsemanticimplications.Here,conceptdiffersfromthecastlist[ 48 ]inthattheconstituententitiesarenotnecessarilyfacesoractorsbutmayincludeanyentitythatisdistinguishing 54

PAGE 55

tocharacterizetheconcept.Also,thecombinationarenotasimplelistofentities,butemphasizetheirinteraction:throughcertainposturesandactionsoftheentities,thesemanticisvividlyimpliedwithintheconcept.Theseconceptsshapethecourseoftheplot,thusaretheskeletonofthevideo.Furthermore,eachconceptismaterializedbymultiplevideoshots(wecalltheminstancesofaconcept).Eachshotinstance,asthemuscleofthevideo,elaboratestheconceptfromaspecicperspective.(e.g.inFig. 3.1 ,theconceptofhavingconversationisexpressedbyshot1toshot17.Someinstancesdepictonepersonwhileothersdepictanother,togethertheyconveyfourpersonsarehavingaconversation.)However,sincetheinstancesfortheportraitthesameconcept,theyaredestinedtoincluderedundanciesinthesummarizationsense.Therefore,theconceptscanbeseenasaparsimoniousabstractionofvideosemantics,whichsuitsoursummarizationpurposeverywell:byeliminatingunnecessaryinstanceswithsemanticredundanciesbutkeepingtheconceptsintactinthesummary,weshortenthelongvideowithoutcompressingthesemantics.Thepreservationofaconceptmaybefullledbykeepingtheinstanceswithmoredistinguishablecharacteristictoexpresstheconceptinhigherpriorities.Fromviewer'sperspective,ifviewersareinformedwiththesemanticskeleton,theyareabletorellafullbodyofvideooutlineusingreasoning:theywitnesstheconceptsandthoseentitieswithinandconcatenatethemintoaplotthreadbasedonexperienceorevenimagination,thusunderstandthevideo.Forexample,InFig. 3-1 ,whenallconceptsinthemiddlerowareavailableinsummary,viewersarecapabletorealizethefourpeoplearehavingaconversation.Therefore,thevideosummarizationconsistsoftwophases:exploringtheinherentconcept-instancestructurefollowedbypreservingthoseprioritizedinstancesinthesummary. 55

PAGE 56

3.3RecognizetheConceptsInourcontext,discoveringconceptsisanunsupervisedlearningproblem.Werstdivideafullvideointomultipleshotsbasedonsignallevelchangedetectionandregardeachshotasasample.Thenweuseanunsupervisedlearningmethodtoclusterthesamplesintoseveralclasses.Eachclassofsamples,collectively,expressesaconcept.Forfeatureextraction,theconceptdenedasentitycombinationimpliesthesampleswithinthesameconcept/classsharethesimilarinteractionofentities.Thus,afeaturevectorcapturingtheconstituentsofentitiesofashotisrequired,preferablywiththerobustnesstovariationsofentitiesduetopostures,scaleandlocations.Forthatpurpose,weproposethebag-of-words(BoW)modelasthefeatureinthistask.BoWmodel[ 49 ]wasinitiallyutilizedinnaturallanguageprocessingtorepresentthestructureofadocument.Itregardsadocumentasacollectionofcertainwordsfromareferencedictionarybutignorestheirorder.Inourcontext,BagisashotinstanceregardedasacontainerandWordsarethevisualentitiesinside.TheBoWmodelprovidesthemacrolookofsemanticingredientswithinaninstanceandemphasizestheconceptualsimilarityamonginstances,consideringentityattendanceonly.InFig. 3-2 ,afarviewofroseandbutteryandaclose-uplookofthesameentitiesshouldbothimplythesamesemanticimplications,despitetheroseandbutterymayappearindifferentlocationsorscales.Evidently,theBoWfeaturegraciouslyexpressestheorder-irrelevantpropertyandmeasuresthetwoimagesinsimilarvalues.ThenextquestionishowtoconstructtheBoWfeatureforeachshot.Fortherepresentationofwords,ideallyitshouldfeatureaperfectlyextractedentity.However,duetothelimitationsofcurrentobjectdetectiontechniques,wealternativelyusetheSIFTfeaturepoints[ 50 ]withinthedetectedsalientobjects[ 47 ]torepresentanentity/words.Parsingthroughouttheoriginalvideo,alloccurringwordsconstituteadictionary,whichdescribesaspaceofallentitiesoccurringinthevideo.Inordertodistinguishthosesignicantentities,wecompressthewordsintocodewords,which 56

PAGE 57

Figure3-2. Left:twosemanticsimilarframes.Righttop:wordsandcodewords.Bottom:aBoWvector.PhotoscourtesyoftheanimationBigBuckBunnybytheBlenderinstitute[ 5 ]. conveycharacteristicentitiesafterdimensionreduction.Foreachsample/shot,wecountthenumberofoccurrenceofeachcodewordasitsnalBoWfeaturevector.Fig. 3-2 showstheoriginalwordsforaframe,thecompressedcodewordsandalsothebag-of-wordsfeature.Finally,weexploitspectralclustering[ 51 ]toclustershotsintoseveralclasses,eachofwhichcorrespondstoaconcept. 3.4TheSummarizationMethodology 3.4.1CriteriaofaGoodVideoSummaryThemostsignicantcriterionofagoodsummaryistopreservesalloriginalconcepts.Ifthereisabsenceofanyconceptinthesummary,viewersmaybemisledbyfractionalcluesandfantasizeatotallydeviatedplot.Also,conceptsshouldberenderedwithbalance,i.e.inthesummarythereareequalorcomparableoccurrencesofinstancesforeachconcept.Aseveryconceptinthesummaryaresemanticallycompactanddecisive,ignoringoroveremphasizinganyoneduetoimbalancedrenderingmaymisleadtheviewerstorecoveraplotdeviatingfromtheoriginal.Thethirdcriterionisthateachconceptispreferablyrenderedbythemostinterestinginstances.Thenforagivenlength,thesummarynotonlypreservethesemanticsbutalsotriggersviewersmostintereststorecovertheplot. 57

PAGE 58

3.4.2ConstrainedIntegerProgrammingGivenaclearsemanticstructure,wemodelvideosummarizationasanintegerprogrammingproblem.Letcijbethebinaryindicatorthatdenotesifweretainaninstanceinthesummaryornot,sijisthejthinstanceforconcepti.Wewanttondacombinationofinstancesindicatedby^c=f^cig,whichaimtodeliverallconceptwithmostinterestingness,orequivalently,maximizesthetotalattentionsaliencyforeachconcept. ^ci=argcimaxPnjj=1cij(sij)8i=1N (3) s.t.PNi=1Pnjj=1cijr (3) min(br Nc,ni)ki=Pnij=1cijmin(dr Ne,ni) (3) 8i21,...,Nc2fcijjargmaxcijPNi=1Pnij=1cijIig (3) where(sij)isthesaliencyofsij,niisthenumberofinstancesfortheithconceptintheoriginalvideoandkiisthenumberofinstanceofconceptiinthesummary.Nisthetotalnumberofconceptsintheoriginalvideo.risthemaximumnumberofinstanceswecanretaininthesummarizedvideo.Iiistheimportancerankofconcepti,whichisdependentonthenumberofframesforthisconceptpattern,basedonthecommonsensethatdirectormaynaturallydistributemoretimetoportraitamoreimportantconcept.ConstraintIinEq. 3 comesfromthesummarylengthlimit.Inourapproach,weceilthemaximumnumberofframesinadetectedashotbelowsomeconstantnumber,thusrisalmostproportionaltothepredenedsummarizationratio.ConstraintIIinEq. 3 isimposedbytheconceptcompletenessandbalancecriterion.Givenasummarizationratio,wealwaystrytodeliverallconceptsusingcommeasurablenumberofshotinstances. 58

PAGE 59

ConstraintIIIinEq. 3 dealswithcriticalsituationswhenrdecreasestoasmallvaluesothatthesummarycouldnotkeepallconceptswithevenoneshotinstanceforeachorinthesituationthatasummarylengthdoesnotallowallconceptstobedeliveredwithabsolutelyequalnumberofinstances.Herewegiveprioritytoconceptswithlargerimportancetobekeptinthesummary.ConsideringconstrainIIinEq. 3 ,itisrequiredthenumberofinstancesforeachconceptshouldbealmostthesame,ifachievable,nomatterhowlongthesummaryis.Therefore,weconstructthesummaryinabottom-upfashion:startingfromanemptysummary,everytimeweallowonlyoneinstancefromoneconcepttobeadmittedinthesummaryandcontinuethisprocessuntiltheresultantsummarylengthreachitslimit.Therefore,nomatterwhentheprocessterminates,wemayguaranteethatthenumbersofinstancesfordifferentconceptsdiffersonlybyone(exceptthetimethatallinstancesarerecruitedinalongsummary).Thescalabilityofthesummaryisstraightforward:ifthesummarylengthisshort,wekeepamodestnumberofmostdistinguishableinstancesforeveryconcept.Ifthesummarylengthislonger,wemayaddmoreinstancesforeveryconcept,renderingamoredetailedsummary.Thistraittoadaptdetailpreservationinthesummarytovariablelengthsuitsverywellfortheneedsofubiquitousmultimediaterminalsthatacceptsdifferentlevelsofsummaries.Inthealgorithm,whichinstancewillbeincludedinthesummaryisthekeyquestion.Aninstanceisindexedbytherankofconceptitbelongstoandalsoitsrankwithintheconcept.Asaforementionedassumption,weranktheconceptsaccordingtoitslengthintheoriginalvideo.Asfortherankingwithinaconcept,weranktheinstancesofaconceptaccordingtoitssaliencyvalue.Thus,theadmittedinstancesforaconceptcontainthehighestinterestingnessaccumulationforthegivensummarylength,fulllingtheobjectivefunctioninEq. 3 .Here,basedonhumanattentionmodelingin[ 47 ],wedenethevisualsaliencyv(si)ofashotsiistheaveragesaliencyoverallframest 59

PAGE 60

withintheshot. v(si)=1 jsijjsijXt=1(t)(3) 3.5ExperimentResults 3.5.1AnExampletoIllustrateOurAlgorithmInthissection,weevaluatetheperformanceoftheproposedconceptrecognitionmethodandalsotheoverallvideosummarizationapproach,withtwosummarizationratio50%and20%used.WeimplementtheproposedapproachinCcode.Theevaluationvideosequenceisthe10-minuteBigBuckBunny,whichcomesfromtheOpenMovieprojectwithoutlicenseissue.Fig. 3-3 showstherecognizedconcepts.Sincetheoriginalvideoistoolongtovisualize,weillustratetheframesthataresampledevery15secondsfromtheoriginalvideo.InFig. 3-3 ,eachconceptconsistsoftheframesinthesamebordercolor.Itshowsthattheframesthatexpressthesimilarsemanticsareclusteredintothesameconceptindeed.Forexample,theframesinredboxalldepictthebunnyisplayingwiththebuttery;theframesingreenboxallshowthatthebunnyisstandingbyhimself.Also,wevisualizethreeframeswithnobordercolorastheyaretheoutliersofclustering.Thisresultcanbeexplainedbytheirnegligibleoccurrencesinthevideo:theyaretootrivialtoexpressaconcept.Forthenumberofconcepts(thecomplexityofclusteringmodel),weenumerate5,10,15,20and25asthecandidatestomeasuretheclusteringerroranduseOccam'srazorruletonalizeitas10.Theclusteringresultindicatesourconceptrecognitionmethodworkswell.Fig. 3-4 showsthesummaryresultsof50%and20%ratio,respectively.Notethatthe20%summaryisasubsetofthe50%summary.Thisresultsuggeststhatthesummarycanbepackagedintomultiplelayers,witheachlayerasthesummarydifferencefortwoadjacentlengthscales.Thisfactsuitsforthescalabilityinthemultimediasystem,sinceaterminalcandependonitsownneedsorsituationto 60

PAGE 61

Figure3-3. Theimageswiththesamecolorbelongtothesameconcept.PhotoscourtesyoftheanimationBigBuckBunnybytheBlenderinstitute[ 5 ]. Figure3-4. Top:the50%summary.Bottom:the20%summary.Theconceptsarecompleteineachsummary.PhotoscourtesyoftheanimationBigBuckBunnybytheBlenderinstitute[ 5 ]. receivehowmanylayerswithoutchangingofthesummarizationprocess.Also,boththetwosummarycontaincompleteconceptsandthuspreservesemanticseffectively.Itisnaturalforthe50%summarytoincludemoredetailssincethesummarylengthallows. 3.5.2SubjectiveEvaluationWealsocarryoutasubjectivetestofthehumanresponsetothegeneratedsummaries.Weadopttwometricsinformativenessandenjoyabilityproposedin[ 42 ]toquantifythequalityofthesummaryunderdifferentsummarizationratios.Enjoyabilityreectsuser'ssatisfactoryofviewingexperience.Informativenessaccessesthecapabilityofmaintainingcontentcoveragewhilereducingredundancy. 61

PAGE 62

Table3-1. Thestatisticsofscoresoffourvideoclips.Videocourtesyof[ 4 6 ]. E:enjoyability.I:informativeness. OurapproachScalableapproach[ 52 ]MeanVar.MeanVar LoRE64.0518.2162.8819.8250%I66.0818.7165.5018.67 BBTE63.2820.3161.0721.7930%I69.8218.5666.4919.60 BBBE61.2521.8662.3020.3220%I68.0016.9067.5017.26 CNNE55.0619.5953.9220.1910%I60.8718.5859.4917.15 Thesubjectivetestissetupasfollows.First,incaseofviewingtirednessofparticipants,wecarefullypickedfourtestvideos:a7-minutemovieclipinlordoftherings(LoR)fromtheMUSCLEmoviedatabase( http://poseidon.csd.auth.gr/EN/MUSCLE_moviedb/index.php ),a6-minuteTVclipinthebigbangtheory(BBT)[ 4 ],a10-minuteanimationclipinthebigbuckbunny(BBB)[ 5 ]anda6-minutenewsclipinCNNstudentnews(CNN)[ 6 ].Then,wechosesummarizationratios50%,30%,20%and10%respectivelytoconsiderboththeordinarysummarizationcasesandtheextremecasesandforLoRtotest.Weusebothourapproachandthescalableapproachin[ 52 ]togeneratethesummary.Duringthetest,eachparticipantwasaskedtogiveeachsummaryenjoyabilityandinformativenessscoresinpercentagerangingfrom0%)]TJ /F5 11.955 Tf 12.96 0 Td[(100%.Wecollectedscoressubmittedby60participantsandcalculatedthemeansandvariancesofthescoresforthetwoapproaches,asindicatedinTable 3-1 .Fromthescorestoourapproach,itshowsthatbyreducing50%to90%ofthevideocontent,theenjoyabilityandinformativenessonlydropslessthan50%,whichisanencouragingresult.Also,comparedwiththescalableapproach,ourscoresinallinformativenessandmostenjoyabilitycasesareslightlybetter.Thisresultssuggestsourapproachmaypreservemoresemanticinformation. 62

PAGE 63

CHAPTER4PERCEPTUALQUALITYASSESSMENT 4.1BackgroundDuetothetremendousadvanceofvideocompressionandnetworkcommunicationtechnologies,videomediainmuchlargerresolutions(1280x720,1920x1080)gainspopularityfromcontentprovidersandisincreasinglysupportedintheVideo-on-Demand,Livebroadcastingandvideotelephonyapplications[ 53 54 ].Whenwiredorwirelessnetworksoccasionallyfailtodelivermediapackets,thereceivedmediamaybeimplicatedbythepacketlossevents.Differentfromthetraditionalvideomedia,eachofthevideopacketsinlargeresolutioncasesmayimpairaportionofframeinsteadofdroppingthewholeframe.However,theimpairmentmaypropagatethroughasequenceofframesfollowed;thedegradationbecomesmuchmorenoticeableasisinsharpcontrastwiththeadjacentunaffectedregions.Thus,incurrentHDvideotransmission,thedegradationofapacketlossissignicantlydifferentfromtheconventionalartifactlikeframefreezeandjitter[ 55 ];instead,thepacketlosseventresultsinauniqueglitteringblockcontaminationwhenthereceivedcontentisdisplayedandthecontaminationaggravatesalongwiththeobjectmotiontraceinthevideomedia.Fig. 4-1 showstheglitteringartifactpropagatingasequenceofframesalongskatermotiontraces.Inordertobetterpredictandthuscontrolthedegradationresultedfrompacketloss,anautomaticsubjectivequalitymeasurement(perceptualquality)mechanismdedicatedtopacketlossrelateddegradationsisrequired.Theperceptualqualitybringsthebenetofquantifyingtheend-to-enddistortionofavideotransmissionsystemandguidingittocongureoptimistically.Also,theperceptualqualitymayhelpcontentoperatorstoassessthequalityoftheirprovidedserviceintermsofhumanvisualsatisfaction,whichpotentiallydifferentiatesthevariousservicelevelsandthushelpstogeneraterevenuesmoreefciently. 63

PAGE 64

Figure4-1. Theglitteringblocksoriginatefromtransmittedpacketlossintherstframe.Astheframesfolloweddependontherstframe,thepacketlossartifactsaggravatesalongthemotiontrace,appearingasglitteringblockcontamination.Photoscourtesyofthevideotracelibrary[ 3 ]forYUVvideosequences. Qualitymetrictechniquescanbedividedintofullreference[ 56 ],reducedreference[ 57 ]andnoreference[ 58 ]methodsaccordingtohowmuchinformationofthereferencevideoframeavailablewhenassessingthereceivedanddecodedvideo.Fullreferencemethodsassumethatallpixeloftheoriginalvideoareavailable;reducedreferencemethodsassumenoavailablefullpixelinformationbutsomesparseparameters/featuresextractedfromtheoriginalvideoareaccessible;noreferencemethodsassumethatthereisnooriginalvideoinformationatall.Fromanothertaxonomypointofview,qualitymetrictechniquescanbecategorizedintoobjective,perceptualandsubjectivequalitymethodsbasedonthelevelofdistortionresponses.ObjectivequalitymeasurementtechniquessuchasPeakSignalNoiseRatio(PSNR)andSumoftheSquareError(SSE)aresignallevelmetrics.Itisveryeasyandlighttocomputethemaswellastheirderivatives.Thusmanyvideotransmissionsystemsusethemasdistortionmetricstooptimisticallyadapttheencodinganddecodingtothevaryingchannelconditions.However,itisprovedinmanyliteraturesthattheobjectivequalitymeasurementsdonotnecessarilycorrelatewiththehuman 64

PAGE 65

visualresponsewell[ 59 ].Ontheotherhand,subjectivequalityreferstoanumericthatqualiesthetruehumanresponsetovideodegradation.Althoughitisthetruevalueofhumanresponse,thetediousrequirementforlaborandconstraintsforlaboratory-likeenvironmentprohibitsitfromtheusageofreal-timeandlargescalequalityassessment.Inbetween,theperceptualquality[ 60 63 ]isacomputationalmodelforqualitymeasurewhichaimsatmimickingthehumanscoreresponse.Itenjoysthestrengthofacloserelationtohumanscoresaswellasthecapabilitytobeconvenientlyappliedintovideotransmissionsystemsduetoitsautomaticnature.Thegeneralmethodologyforperceptualqualitymetricistoextractfactorsfromeitherthereceivedbitstreamorthedecodedpicturesandcombinethemusingamodeltopredictthequalityvalue.Forthefactorextraction,manytechniques[ 7 ][ 64 ]utilizecompressionparameterssuchasframetype,motionvectorstrengthandpacketlossparameterssuchasthelengthoflossburstandthelosslocation.Thesemethodsprovidetheopportunitiestoestimatethequalityonnetworklevel;however,sincethesefactorsdonotdirectlyshapethedegradation,thereisalwaysacognitivegapbetweenthemandthehumanresponse.Totacklethisproblem,thesemethodsalsodeneamodelsuchasgeneralizedlinearmodel,supportvectormachine,decisiontreetomapthefactorstothehumanresponsescoreandtrainthemodelfromsomepacketlosspatternsvs.thehumanresponsescoresamples.Generally,themodelparametersaresorelyobtainedbytrainingwhilesometimesthemodelformisalsotrained(e.g.decisiontree).Infact,themappingfromfactorstothehumanscoreinvolvesacomplicatederrorpropagationprocessandapsychologicalresponseprocess.Therefore,thehighlynon-linearnatureofthemappingmaynotbeproperlymatchedbythepuredata-drivenmodeltrainedwithalimitednumberofsamplesgeneratedbyalimitednumberofpacketlosspatterns.Asaresult,themodelmaybesensitivetothevariationofpacketlosspatternsandencodingparameters. 65

PAGE 66

Inthisdissertation,weproposeaperceptualqualitymodelthataddressesthehumancognitiveresponsetothevideopacketlossdegradation.Themodelisanon-referencequalitymetricsinceitisdesignedforavideotransmissionsystemwherenoinformationoftheoriginalvideoisavailable.Aimingattherst-handpsychologicallysensiblefactorsthatarerelatedtohumangradingscore,theauthorsconductasurveythatfocusesondiscoveringconcretethepsychologicaleffectswhenhumanviewersperceivethecorrespondingartifacts.Thentheproposedmodelquantiesthetwodiscoveredfactors,obviousfacedeformationandglitteringblockedgestrength,usingadvancedimageprocessingtechniques.Thecombinationofthetwofactorsislearnedfromtrainingsamples(videoclipswithdegradationvs.humanresponsescores)withlinearregressiontechnique.Sincethemodelexploitsdirectpsychologicalfactors,whicharecognitivelyexplainable,themodelitselfcouldberepresentedbyasimplerformandthetrainingprocessismuchmoreatease,asopposetothecasewhereindirecttransmissionandvideostreamlayerfactorsareusedandthusamuchmorecomplicatedmodelisexpected.Also,ourmodelisindependentwithmostpre-decodingfactors;thuswhentheactualtransmissionsystemwithdifferentencodingparametersandtransmissionpacketlosseventsfromtheonesusedformodeltraining,theproposedmodelismorerobust. 4.2TheConstructionofTrainingSampleDatabaseInordertolearntheperceptualqualitymodeltopredicthumanresponse,videofootagevs.humanscoressamplesfortrainingarerequired.Acompellingsampledatabasehereshouldspanavarietyofvideocontents,containawidedistributionofthehypotheticalreferencecircuits(theencoding,transmissionanddecodingchain)thatgeneratetheartifactsandalsohavehumansscorescoverfairlyovertheentirescorerange. 66

PAGE 67

4.2.1TheWeaknessofPubliclyAvailableDatabaseCurrently,thepubliclyavailablehumanscoredatabasesincludetheVideoQualityExpertGroup(VQEG)HDTVdata[ 65 ],LIVEVideoQualityDatabaseandLIVEMobileVideoQualityDatabase[ 66 ].Thersttwodatabasesaimatavarietyofvideosampleswithcommonartifactssuchasblurring,noise,compressionlossandtransmissionloss.Sincetheyaredatabasesforall-inclusiveperceptualqualitymodels,theydonotnarrowdowntothespecicartifactsrelatedtopacketlossduringtransmissionandhencenotprovideenoughsampleswithvarioushypotheticalreferencecircuitsconguration.Incontrast,LIVEMobiledatabaseisadedicatedsamplesetfortheartifactsinmobileapplicationsanditcontainsvideosamplesgeneratedbyvariousencodingandtransmissionparameters.However,thedatabaseseemstofocusonthemappingofencodingandtransmissionfactors,ratherthanthepsychologicalfactors,tothesubjectivequalityvalue.Thereforetheydonotprovideinformationaboutthepsychologicaleffectsofhumanresponsetotheartifacts,whichisinsteadourproposedmethodology. 4.2.2VideoSampleGenerationInthisdissertation,theauthorfocusesonthemodelingofperceptualqualityoftheartifactsresultedfrompacketlossandprovideanewsampledatabasethatdoesnotonlycontaintheHypotheticalReferenceCircuit(HRC)parametersvs.humanscoresbutalsotheinformationabouthowhumanviewers'psychologicalresponsetotheartifactsandreasonswhytheygiveacertainscorelevel.Theauthorattemptstogeneratepacketlossrelatedartifactsascompleteaspossiblesothatthesamplesinthedatasetincludeawidevarietyofencodingandtransmissionparameters.Inthisdatabase,thevideofootagearegeneratedbyvaryingtheencodingandtransmissionparameters.Theoriginalvideoscontainvesequencescrew,ice,harbor,crowdrun,parkjoyfromVQEGwebsite[ 67 ].Theresolutionsare4cif(704x576,usedinDVD)fortherstthreeand720p(1280x720,usedinHDTV)forthelasttwo.Eachofthe 67

PAGE 68

vesequenceshasdurationaround10sandcarriesauniquemotionandtextureform,includinghigh-foreground-low-backgroundmotion,low-foreground-high-backgroundmotion,humanfaces,naturalsceneandhumancrowd.Sincewefocusontheartifactsrelatedtopacketlossratherthanotherartifactssuchascompressionloss,weencodeeachvideosamplewithafairlyhighbitrate6Mbps.WechooseanencodingstructureIPPPwithGroupofPictureslengthas25toguaranteetheartifactisabletopropagateandisnoticeabletotheviewers.AlsothecodingtoolsinH.264AVCstandardsuchasArbitrarySliceOrderandFlexibleMacroBlockaredesignedtocounteractthelossevents:forbitstreamsencodedusingdifferentslicegroups(partitionofaframe),thesamelosseventsmayresultinsignicantlydifferentartifacts.Thereforeweapplydifferentslicegroupparameterstoencodethesamevideo,leadingtoadiverseformofartifactsinpacketlossevents.Asregardstothesetupoftransmission,wepacketizeeachbitstreamwithconformationtoH.264AnnexBstandard(eachslicecontainsinonepacketandthedelimiteris0x000001)andcontrolthatthepacketlosseventshappenafterthe3rdsecondtopreventviewersfromneglectingartifactsprematurely.Wealsoassumethelosspatternisaburstlossasitisafrequentcaseincurrentmobileapplications.Byvaryingthenumberoflostpacketsandalsothelocationoftherstpacketloss,wegenerateacollectionof99corruptedbitstreams.ThecorruptedbitstreamsarethendecodedbyJM16.0usingtheerrorconcealmentmethod(motioncopy)andweobtainedallthevideoswithartifacts.Table 4-1 summarizesthedistributionofgeneratedvideosampleswiththeircharacteristicsandtheHRCused. 4.2.3ScoreCollectionandSurveyConductionInthescorecollectionphase,weadopttherecommendationofVQEGHDTV-1testplan[ 68 ].Weselect26viewerstorate99videosampleswithpacket-lossrelatedartifacts.Theviewercandidatesconsistofthefollowingcategories,basedontheirbackgroundrelatedtovideoprocessingexpertise,videocodingscientists,ElectricalandComputerEngineeringgraduatestudents,generalengineeringstudents,photography 68

PAGE 69

Table4-1. Thedistributionandpropertiesofthegeneratedvideoclipswithpacketlossartifacts.I-interlace,C-chessboard,R-RegionofInterest.Courtesyofthevideotracelibrary[ 3 ]. VideoSampleVideoMotionSlicePacketErrorsourcesizesizepropertypartitionlosslengthconcealment Crew27CIFHighfglowbgI,C,R3,5,7MotioncopyIce18CIFHighfglowbgI,R3,5,7MotioncopyHarbor18CIFLowfglowbgI,R3,5,7MotioncopyCrowdrun18720pHighbghighfgI,C3,5,7MotioncopyParkjoy18720pHighbglowfgI,C3,5,7Motioncopy hobbyists,animationartistsandamateurs.Theviewershaveoriginal/correctedvisionacuityofatleast20/20.Eachparticipantispresentedwithmanyvideosamplesasyuvdocuments,displayedbyYUVViewerrunninginDELLdesktopswith1280x720resolution.Thedistanceofthestudentsfacingtowardthecomputerscreenliesinbetween40-60inches.Consideringthatthevideosampleswiththesamecontentmayeasilymakeviewersfeelfatigue,wedividethe99ratingsintofoursessions,witheachsessionlessthan30minutes.Viewersareallowedtowatchthesamesamplebackandforthuntiltheyreachacondentrating.Inordertocounteracttheusers'adjustmenttocertainlosspatternsandthusthepossibleunnecessaryforgivenesstolatervideosamples,werandomizetheorderofthevideosampleforeachviewer.Intheratingprocess,theveoriginalvideoisalsopresentedwithoutnoticingtheviewers.TheviewersareaskedtogiveonescoreusingAbsoluteCategoryRatingScale(ACRS)rangingfrom1to5,accordingtothepsychologicalsatisfactionontheperceivedartifacts.Also,inordertodiscoverthepsychologicalfactors,theviewersareaskedtochooseatleast20videosamplestoanswerthequestionsinTable 4-2 .Theratingsofthe20chosenvideosamplesarepreferredtospanequallyoverthescorerangesinceweaimatpsychologicalfactorsexplainableforallscores.Afterthescorecollection,wesubtracteachratingscorefromthescoreofitsoriginalvideotocalculatetheDifferenceScore.Thenforeachvideosample,wecalculatethemeanvalueoftheDifferenceScoreoverallparticipantsasthenalDifferenceMeanOpinionScore(DMOS). 69

PAGE 70

Table4-2. Thequestionsforthevideosampleschosenbyeachviewer IDQuestions 1Doyouthinkyouhaveagoodreasontogivesuchascore?2Whydoyouthinkitissuchascore?3Whatdidyouobserve? 4.2.4AnalysisoftheSurveyResultAftermanuallyparsingtheanswersattachedtothechosenvideosamples,wesummarizethat97%ofparticipantsexplaintheirratingsusingreasonsthatcanbeattributedtothefollowingtwoaspects.Theyarehowhumanfaceisdistortedandthestrengthofglitteringblocks.Therefore,weconcludethatthetwoaspectsarethedominantpsychologicalfactorsofthehumanresponseonthevideopacketlossrelatedartifacts.InSec. 4.3 andSec. 4.4 ,wewillelaboratetheimageprocessingtechniquesweusetodescribethetwofactors. 4.3GlitteringArtifactDetectionThissectionaddressesthemeasurementoftherstdominantpsychologicalfactorthestrengthofglitteringblocks.Fig. 4-2 isatypicalvideoframethatisimpairedbypacketlossrelatedartifacts.Theareasindicatedbytheredarrowsshowtheexemplarappearanceoftheglitteringblocks.AsshowninFig. 4-2 ,theareashavecrystalshapesandedgesandalsodemonstratesignicantcontrastwiththeneighboringunaffectedregions.Also,thecolorsoftheareasappeartobearticial,imposingadistinguishingdiscontinuityoverthenaturalunaffectedregions.Therefore,weproposeanedge-mapdetectionmethodtocapturethediscontinuityoftheareas.Notethatthereasonwhytheglitteringblockshavesuchregularappearanceliesintheblock-basedencodingincompressionstandardssuchasH.264aswellasthefactthatalostpacketinducesintegernumberofblockslostonthedecodedframe.Whenapacketlostduringtransmission,thecorrespondingslicedatacontainingmultiple16x16Macroblocks(MB)islost;thereforetheboundarylinesbetweentheartifactsandtheunaffectedregionsfollowthe16x16latticesoftheframe.Thedecodertriestohandlethe 70

PAGE 71

Figure4-2. Anexemplarframeinvideocrewdegradedwithglitteringblocks.Photoscourtesyofthevideotracelibrary[ 3 ]forYUVvideosequences. lostMBsgracefullyusingerrorconcealmenttechniques,includingcopyingthecontentfromthereferredMBsofthelostMBalongthemotionvectors.However,duetotheerroneousmotionestimation,encodingmodeandtheMBresidue,thereferredMBsarenotnecessarilyconsistentwiththelostMBadjacencies.Therefore,itisstillpossibleforhumanstoobservetheartifacts. 4.3.1EdgeMapGenerationBasedontheanalysis,werstapplyahighpasslteroverthe16x16latticeofthedecodedframetoemphasizethepossiblelocationsthatartifactscanoccur.ThehighpassltercanbeimplementedbasedonEq. 4 ,andtheedgemapE(x,y,i)iscalculatedbyapplyingthelterontor,g,bcolorplaneoftheoriginalframeI(x,y,i)respectively. 71

PAGE 72

E(x,y,i)=8>>>>>>>><>>>>>>>>:0forbx 16c,by 16c6=01 2y+1Xv=y)]TJ /F4 7.97 Tf 6.58 0 Td[(11 2jv)]TJ /F10 7.97 Tf 6.59 0 Td[(yjj41xI(x,v,i)j(1+e)]TJ /F20 7.97 Tf 6.58 0 Td[()forbx 16c=0,by 16c6=01 2x+1Xu=x)]TJ /F4 7.97 Tf 6.59 0 Td[(11 2ju)]TJ /F10 7.97 Tf 6.59 0 Td[(xjj41yI(u,y,i)j(1+e)]TJ /F20 7.97 Tf 6.59 0 Td[()forbx 16c6=0,by 16c=0(4)where4xI(x,v,i)=I(x,v,i))]TJ /F3 11.955 Tf 10.27 0 Td[(I(x)]TJ /F5 11.955 Tf 10.27 0 Td[(1,v,i)and4yI(u,y,i)=I(u,y,i))]TJ /F3 11.955 Tf 10.27 0 Td[(I(u,y)]TJ /F5 11.955 Tf 10.27 0 Td[(1,i).=minj4xI(x)]TJ /F5 11.955 Tf 11.95 0 Td[(1,v,i)j,j4xI(x+1,v,i)jand=minj4yI(u,y)]TJ /F5 11.955 Tf 11.95 0 Td[(1,i)j,j4yI(u,y+1,i)j.(x,y)and(u,v)arelocationsofanedgepixelandiistheindexofthecolorplane.Thenaledgevalueforapixelisthemaximumedgevalueofthethreecolorplane. E(x,y)=maxE(x,y,r),E(x,y,g),E(x,y,b)(4) Figure4-3. ArtifactalongMBboundarywithtwopixelsineachside. Fig. 4-3 showstherationaleoftheEq. 4 .Theverticaldashlineisaverticalboundaryinthe16x16lattice.Pixel3isrightonthelatticesopixel1and2belongstooneMBwhilepixel3andpixel4belongtoanother.OnatypicalartifactthatoccursacrossthetwoMBs,thepixelvaluewoulddiscontinuefrompixel2topixel3,indicatedbytheredcurve,whereasthepixelsthatbelongtothesameMBtendtohavesimilarpixelvalue,indicatedbythebluecurves.Thusweusetheabsolutedifferencevalueof 72

PAGE 73

thetwopixelsastheblockedgestrengthindicator.TheindicatorisfurtherampliedifatleastonepairofpixelsonthesameMBdemonstratessimilarpixelvalue.Also,tomaketheindicatormorerobusttonoise,weaveragetheedgealongtheMBboundarydirection.Inthisedgelocationselectionstep,weaimatselectingallpossibleedgelocations;soweusethemaximumvalueofthethreecolorplanesasthenaledgevalue.NotethattheartifactscanonlyhappensattheMBsthatarelostoraffected(theirreferredMBsarelost),wemasktheedgedetectionontheaffectedregions.Fig. 4-4 istheEdgelocationselectionresultofFig. 4-3 Figure4-4. Theresultofedgelocationselection. 4.3.2StructureDetectionNotethattheoutputoftheedgemapispixel-wiseedgepixelvaluewhiletheartifactsareattendedinablock-wisefashionbytheviewers.Fig. 4-5 isaclose-upviewoftheartifactregioninFig. 4-2 anditsdetectededgevaluefromthelaststep.Theedgestrengthofartifactagainstitsadjacency,highlightedbytheredcircle,variesalong 73

PAGE 74

theMBboundary,whichisbestreectedbyvariationoftheedgestrengthintherightedge-map.However,humanstendtopsychologicallyregardthattheartifactsoccurringalongaMBboundaryasatomic,insteadoftreatingeachedgepixelindividually.Therefore,structuralizingtheedgepixelvalueintostructure-wiserepresentationmayimprovethemeasuretheglitteringblockfactor.Werstbinarizetheedgemapwithathreshold.Thenforeachboundary(horizontalorvertical)ofthe16x16lattice,ifthenumberofedgepixelswhosevalueisaboveismorethan4,wemarktheallhalfofthe16boundarypixelsasedgepixels;otherwiseremovethemfromedgepixels.Inthismanner,thedetectededgemapis8-pixel-structure-wise. Figure4-5. Block-wiseperceptionvs.pixel-wisedetection.Photoscourtesyofthevideotracelibrary[ 3 ]forYUVvideosequences. 4.3.3FalseIntersectionSuppressionWhenwechooseconservativelyincaseofmissingedgelocations,weunnecessarilyincludesomefalseedgestructures.ThemostpossiblecaseisedgeintersectionsinFig. 4-4 .Normally,acrystalshapeglitteringblockappearstobeconvexsoonlythecontourisdesiredinedgemapinsteadoftheinternalintersectionsduetoblock-basededgedetection.Althoughthescenariowhereedgeintersectionisindeeddesiredcanoccur,itspossibilityisverylow.Thuswesuppresstheedgeintersectionasalessvalidedgestructure.Fig. 4-6 isarepresentationofthescenariosoftheartifactdistributionandtheircorrespondingedgestructuretobedetected.EachsquaredenotesanMBand 74

PAGE 75

iscodedusingacolor.TheartifactshappenonboundariesofMBsindifferentcolors.Whenconsideringfoursquarestogether,thersttwoscenarioshappenmostoftenandthethirdhappensattimesbutrarelythefourth.ThepossiblereasonfortherarecaseisthatonlywhentheareaisoriginallledwithhighlycomplicatedtextureatthesametimethattheerrorconcealmentmethodreferstoverydifferentregionstopaintthemissingregionsthatmakeallthefourMBsconsiderablydifferent. Figure4-6. Artifactlayoutscenarioswiththeirdesirededgestructurestodetect. Thus,foranintersectiondetectedbytheedgelocationselectionstep,wesuppresstheintersectiontoreduceintothesecondscenarioinFig. 4-6 .WemeasuretheedgestrengthinthehorizontaldirectionandtheverticaldirectionasthenumberofpixelsaboveinSec. 4.3.1 .Ifthestrengthofonedirectionismorethantheotherbyoneandahalftimes,weremovetheweakerdirectionfromtheedgemap.Fig. 4-7 showsthatedgemapafterstructuredetectionandfalseIntersectionSuppressionofFig. 4-2 .Weusethenumberofthedetectededgesintheedgemapasthemetricoftheglitteringblockfactor. 4.4FaceDeformationEstimationTheotherpsychologicalfactorthatdominatesthehumanresponsetothepacketlossrelatedartifactsishowhumanfacesaredeformedinthedecodedframe.ItissuggestedintheexperimentsinSec. 4.2.4 thathumanviewersshowlittletoleranceagainstthedeformationofhumanfaceregions,asopposedtotheforgivenesstoothernon-faceregions.ThissituationisbestexplainedbyFig. 4-8 .Thetwoimagesaretwodecodedframeswithartifacts;theycorrespondtothesameoriginalframebutgothroughdifferenthypotheticalreferencecircuits.Theleftimageisresultedfroma 75

PAGE 76

Figure4-7. Theresultofstructuraldetectionfollowedbyfalseintersectionsuppression. HRCwith3lostpacketsandtherightimagehasaHRCwith5lostpackets.Fortheappearanceoftheartifacts,thecontrastoftheaffectedregionagainstitsadjacency,whichishighlightedbytheredcircleineachofthetwoimages,ismuchsharperintherightthantheleft.However,theMOSoftherightimageis3.6whiletheoneoftheleftimageis2.1.Thesurveyparticipantsexplaintheirratingsfortheleftimagearebasedonthehumanfacedeformationcanbeacutelyperceived.Therefore,faceregiondeformationshouldbetreatedindependentlyfromotherglitteringblocks.ThemechanismofadeformedhumanfaceregionisthattheMBsorpartitionswithintheregionarecopiedfromexternalframeswithdifferentdisplacements/motionvectors.WhilethedecodermakesthebestefforttochooseasuitablereferredregionforeachMB/partition,thecopiedcontentsdonotnecessarilymatch,especiallywhentheencodingmodesofthefaceregionarecomplex,suchasinter4x4,intra4x4.Fig. 4-9 showsadeformedfaceanditstwosubregionswithdifferentmotionvectorstothe 76

PAGE 77

Figure4-8. Visuallyperceivedannoyingnessvs.thecontrastoftheartifactagainstadjacency.Photoscourtesyofthevideotracelibrary[ 3 ]forYUVvideosequences. referredframeregionforcopying.Asaresult,thelowersubregionhasalargermotionvectorthantheupperregion,overridingthemiddleareaofthefaceregion. Figure4-9. Adeformedfaceregionanditstwosubregionslostandcopiedfrominconsistentreferredareas.Photoscourtesyofthevideotracelibrary[ 3 ]forYUVvideosequences. 77

PAGE 78

Weproposeamotion-copyvectordisordermetrictoestimatepossiblefacedeformation.First,weuseViola-Jonesfacedetector[ 69 ]tolocateahumanfaceregion.ThenwecalculatethedeformationvalueaccordingtoEq. 4 D(r)=)]TJ /F12 11.955 Tf 11.29 11.36 Td[(Xsr2rp(mv(sr))logp(mv(sr))(4)whereD(r)istheestimatedvalueoffacedeformation.risthedetectedfaceregion,consistingofsubregionsindexedbysr.mv(sr)isthedisplacement/motionvectorofthesubregionsrfromitsreferredregioncopied.p(mv(sr))isthedistributionofthemotionvectorswithintheregionr.Wequantizemotionvectorsintodiscreteintervalsandapproximatethedistributionofmvusingthehistogramofthemvintervals.TheentropylikeestimatorinEq. 4 measureshowthemotionvectorsaredifferentfromeachother,thusdenotesthelikelihoodofthefacedeformation. 4.5ModelTrainingtoMapPsychologicalFactorstoPerceptiveScoresSincethetwofactorsweextractarecloselyrelatedtothehumanresponsescore,weuseasimplelinearmodeltolinkthemtotheMOS.TheparametersofthemodelareobtainedbyLinearRegressionmethod. y=X+(4)AsshowninEq. 4 ,linearregressionassumesthattheresponseyistheinterproductofthefactorvectorXandcoefcientvectorplusanoiseterm.Theregressiontechniquetriestosolvebyminimizingtheerrorbetweenthepredictedresponseanditsactualvalue. ^=argminX(xi,yi)2trainingsetkyi)]TJ /F3 11.955 Tf 11.96 0 Td[(Xik+X(xi,yi)2testingsetkyi)]TJ /F3 11.955 Tf 11.95 0 Td[(Xik(4)Eq. 4 istheoptimizationusedintheregressiontechniquewithcrossvalidation.Thersttermontherightsideisthettingerrorofthetrainingsetandthesecondterm 78

PAGE 79

isthepredictionerrorofthetestingset.isthetrade-offbetweenthetwoterms.Incrossvalidation,wesplitthesamplesofthedatabaseintotwonon-overlappingsubsets,trainingsetandtestingset,andjointlyminimizetheerrortoavoidmodelovertting.Infact,weadopt10-foldedcrossvalidation:thesamplesarerandomlysplitaccordingtotheratio9:1(trainingsetvs.testingset)andthisprocessisrepeated10timesuntilgettingthenalvalueof. 4.6ExperimentsWeuse27videosamplesasthetestingtest.Thevideosampleswithpacketlossartifactaregeneratedfrom5videosequencecrew,ice,harbor,crowdrunandparkjoy,usingthesameHRCsasinSec. 4.2.2 .ThenthehumanscorecollectionprocessalsofollowstheroutineinSec. 4.2.3 .Afterobtainingtheglitteringblockedgeandhumanfacedeformationestimatorfortheimageineachvideosample,weusesoftwareWeka3.6.8[ 70 ]toperformthelinearregression.ThetrainedmodeltopredicthumanscoreisinEq. 4 score=)]TJ /F5 11.955 Tf 9.3 0 Td[(0.0193E)]TJ /F5 11.955 Tf 11.95 0 Td[(0.4982D+3.7578(4)whereEisthemeasureofglitteringblocksinSec. 4.3 andDisthemeasureofhumanfacedeformationinSec. 4.4 .Wealsocomparewiththebitstream/QoSfactorbasedtheperceptualmodel[ 7 ].Theavailablebitstream/QoSfactorpacketlosslength,initialpacketlosslocationandbitstreamslicegroupanddecisiontreemodelareusedtopredicttheperceptualqualityscores.Fig. 4-10 showsthescatterplotofthepredictedqualityscoresvs.actualhumanscoresbythemethodin[ 7 ]andourproposedmethod.Asshowninthegure,thescorepredictedbytheproposedmethodismoreclosetotheactualscorewithlessvariance. 79

PAGE 80

Figure4-10. Red:theQoS/bitstreamfactorbasedapproach.Blue:theproposedapproach. WealsousePearson'scoefcient 4 ,SpearmanRankOrderCorrelation 4 andtheRelativeRootMeanSquareError 4 tocomparethetwomethodsnumerically.Table 4-3 givestheresults. Table4-3. Thecomparisonoftheperceptualqualitymodelin[ 7 ]andourproposedperceptualmodel ModelPearsonSpearmanrankRelativemetriccorrelationrcorrelationRMSE QoS/bitstream[ 7 ]0.76280.776812.18%factorbased Ourproposed0.85590.81048.13% ThePearsoncoefcientisdenedas, r=Pni=1(Xi)]TJ /F5 11.955 Tf 13.64 2.65 Td[(X)(Yi)]TJ /F5 11.955 Tf 13.89 2.65 Td[(Y) q Pni=1(Xi)]TJ /F5 11.955 Tf 13.64 2.66 Td[(X)2q Pni=1(Yi)]TJ /F5 11.955 Tf 13.89 2.66 Td[(Y)2(4)whereXi,YiarepredictedscoreandactualscoresandX,Yaretheirmeans. 80

PAGE 81

TheSpearmanRankCorrelationisdenedas, =Pi(xi)]TJ /F5 11.955 Tf 12.14 0 Td[(x)(yi)]TJ /F5 11.955 Tf 12.25 0 Td[(y) p Pi(xi)]TJ /F5 11.955 Tf 12.14 0 Td[(x)2Pi(yi)]TJ /F5 11.955 Tf 12.24 0 Td[(y)2(4)wherexi,yiaretheordersofXi,YiinEq. 4 .TheRelativeRootMeanSquareError(RRMSE)isdenedasinEq. 4 RRMSE=q P(Xi,Yi)2testingset(Yi)]TJ /F3 11.955 Tf 11.95 0 Td[(Xi)2 score dynamic range(4)whereXi,Yiarepredictedscoreandactualscore.Thus,basedonPearsonandSpearmancoefcients,ourmodelismorecorrelatedwiththeactualhumanscorethanthebitstream/QoSfactorbasedmethod.BasedonRRMSE,ourproposedperceptualmodelincreasesthepredictionaccuracyby4%. 81

PAGE 82

CHAPTER5PERCEPTUALQUALITYBASEDVIDEOENCODERRATEDISTORTIONOPTIMIZATION 5.1BackgroundInthepast20yearsofvideocompressionstandardevolution,eachgeneration,fromH.261intheearly90stothecurrentHEVC,aimsatcreatingencodingtoolsthatcanachievethebestrateanddistortiontrade-off,suitedforthecomputationalcapacityatthattime.Thequestiontondthebesttrade-offbetweenrateanddistortionisdenedasrate-distortionoptimization(RDO).ThecoreaspectsinRDOarendingthebestencodingmodeandchoosingthequantizationparameter(QP)foreachblock.Theencodingmodesincludethepredictionmodesandalsothepartitionmodes,whichessentiallydenehowtopredictthecurrentblockfromitsspatiallyortemporallyneighboringblocks.Thequantizationparameter,criticalinencodingtheresidueofablocksubtractedbyitsprediction,indexesthestepsizeusedtoquantizetheresidueintodiscretesignal,whichcanberepresentedbylimitednumberofbits.IdeallymodedecisionandchoosingQPshouldbejointlyoptimized,however,[ 71 ]reportedthatchoosingthebestmodedependsonaknownQPsincerepresentingmode-relatedinformationneedsbitbudgets,whereQPisusedtopredictbitconsumption.Also,choosingaQPalsorequiresthebestmodebesettledsincethebestmodeproducestheleastblockresidualforQPtoexploitmost.Thus,itisachicken-eggproblem.Tomaketheencodingimplementable,incurrentencoderconguration,aQPisgivenbeforendingthebestmode[ 72 ][ 73 ][ 74 ].Thenbasedonthebestmode,aQPcanbereevaluatedandpossiblyadjustedforasecondpassencoding[ 75 ][ 76 ].Therefore,theRDOcanbesimpliedintosuchmathematicalform, ^mode=argminmodeD(modejQP)s.t.R(modejQP)Rc(5) 82

PAGE 83

whereD(modejQP)isthedistortionoftheencodedblockfromitsuncompressedversion,R(modejQP)isthebitrateconsumedforencodingtheblock.Rcistotalbitratebudgetforthisblock.ThisconstrainedoptimizationproblemcanbereformattedintoanunconstrainedoptimizationproblemusingLagrangemethod[ 77 ], minmodeJ=D(modejQP)+R(modejQP)(5)where5D(modejQP)=)]TJ /F6 11.955 Tf 9.29 0 Td[(5R(modejQP),>0. =)]TJ /F6 11.955 Tf 10.49 8.09 Td[(@D(QP) @R(QP)(5)Whenisavailable,theencoderloopsoverallthemodes[ 78 ]andcalculatethecorrespondingobjectivevalueinEq. 5 andusethemodethatresultsintheleastobjectivefunctionvalue.Notethattoacceleratetheoptimization,manysolutionsareproposed,includingmodeearlytermination[ 79 ],modescopereductionbymachinelearning[ 80 ].Thus,thecorequestioninRDOistochooseasuitableforEq. 5 ,whichrequirestheknowledgeoftheanalyticalformofDintermsofR.Throughouttheencodergenerations,MeanSquareError(MSE)isconventionallyusedasthedistortionmeasureforitssimplicity.Inliterature,manyR-Dmodels[ 81 ][ 71 ]areproposedtomodeltherelationsofdistortion,asMSE,andrate.BasedontheR-Dmodels,thenegativederivativewithrespectofRisassignedtoasinEq. 5 .Specically,intheH.264/AVCcodingconguration,issuggestedas0.852QP)]TJ /F16 5.978 Tf 5.75 0 Td[(12 3.Meanwhile,itisreportedthatMSEdoesnotnecessarilycorrelatewithhumanvisualcharacteristic(HVC)verywell[ 82 ].Thus,manyperceptualqualitybaseddistortionmetricareproposed[ 83 ][ 84 ][ 85 ],eachofwhichclaimscertainHVCarecapturedandincorporatedintotheproposedmetric.Therefore,ifanencoderemploysperceptualqualitymetricinsteadofMSEintoitsRDOframework,itisestimatedthatabetterRDtrade-offcanbeachieved.Theintuitionisthatperceptualqualitymetricdistinguishesthe 83

PAGE 84

certaindistortionaspectsthathumanvisualsystemismostsensitiveof,thusRDOcanarrangebitbudgetsmorewiselytoaccommodatetheaspects,whilesignallevelmetricsuchasMSEdoesnotconsider.Inthischapter,weusethemostpopularperceptualmetricSSIM[ 85 ]inourRDOframework, SSIM(x,y)=l(x,y)c(x,y)s(x,y)(5)WhenperceptualqualitydistortionmetricisusedinRDO,itisintuitivetoreplacetheDintermsofMSEwiththeperceptualqualitymetricinEq. 5 .However,thechallengeisthattheLagrangemultipliershouldbechangedaccordingly.Inliterature,twotypesofmethodologyareproposedtopredicttheperceptualRDO.Therstoneisexperiment-basedapproach[ 86 ][ 87 ].Theauthorsin[ 86 ]rstperformmanyencodingprocesseswithMSEandperceptualqualitybasedRDO,respectively.BasedonthevisualizationoftheRDsamplesresultedfromtheencoding,theyassumethattheRDcurvebytheperceptualqualitybasedRDOparallelswiththatbytheMSEbasedRDOandadoptpredictedbytheMSERDOintotheperceptualqualitybasedRDO.However,theassumptionisnotusuallythecase,asshowninSection 5.3 .Theothertypeofapproachisinformation-theorybasedapproach[ 88 ][ 89 ].Theauthorsrepresenttheperceptualquality(SSIM)intermsofQPbythephysicsoftheperceptualqualitymetric,togetherwiththerateintermsofQPandeventuallyderivetherelationofD(SSIM)withrate.However,duetomanyperceptualqualitymetricareinnon-parametricfashion[ 7 ][ 90 ],itisunrealistictorelatethemwithQPinananalyticalfashion.Also,inthederivation,manyassumptionsbasedonlargenumberofsamplesareusedandthustheresultantasymptoticalconclusion(thederivedRDmodel)mayhaveagapwithaconcreteencodingprocess. 5.2PerceptualRDOframeworkbyPiecewiseLinearApproximationItisdesirabletohaveacomprehensivemethodtomodeltheperceptualqualitybaseddistortionwiththerate.Preferablythismethodisnottobelimitedtoacertain 84

PAGE 85

perceptualqualitymetric.TheLagrangemultiplierthencanbederivedasthenegativederivativeoftheRDmodel.Inthisdissertation,amodelingmethodforperceptualqualitymetricwithrespecttorateisproposed.Also,theauthortakesSSIMasanexample,toperformthemodeling,whereapproximationisusedforeasyimplementation.ThisdissertationproposesaframeworkforperceptualqualitymetricbasedRDOinvideoencoding.Theproposedmethod,asatypeofexperimentbasedapproach,producesaRDmodelassociatedwithaperceptualqualitydistortionmetric.TheRDmodelisbasedonthebestachievableRDtrade-offandthuscangivethebestRDperformance.TheauthorsrstcollectRDsamplesbyrunningRDOwithasetofLagrangemultiplier,whereisenumeratedinawaythatitsbestvalue(tobefound)isincluded.ThenbasedonthevisualizationofRDsamplesontheRDplane,theenvelopecurvethatenclosesalltheRDsamplesisrecognizedasthebestachievableRDcurvefortheperceptualqualitymetric.Finallytheenvelope/desiredRDmodelisttedusingpiecewiselines.Theframeworkincludesvemodules,RDsampling,localRDcurvettingandpiecewiseenvelopegeneration,asshowninFig. 5-1 Figure5-1. TheblockdiagramoftheperceptualqualitybasedRDOsystem. IntheRDOframework,avideoframeisrstinputintotheperceptualRDmodelingunitfortrainingpurpose.AftertheRDmodelislearned,thebestisderivedasthe 85

PAGE 86

negativederivativeofDwithrespecttoR.ThentheRDOprocessesforthefollowingframesareabletobeinitiated.Sincethevideocharacteristicstendtohavehighcorrelationsoveraperiodoftime,thefollowingframescanusethelearnedRDmodeltochoosetheirowndesiredLagrangemultiplier,withoutadditionalcomputationalload.FortheperceptualqualitymetricSSIM,thisdissertationgivesanexemplarRDmodelingsolution.IntheRDsamplingsection,theMSEbasedRDOframeworkisutilized.Itsisscaledto0,thevaluethatmatchesSSIMdynamicrangeand0isusedastheLagrangemultiplierintheSSIMbasedRDOframework.SincethetwoRDOgivecomparableencodingresultsandboththeresultsarevisuallygood,thebestssimispresumablyintheneighborhoodof0.Also,asshowninEq. 5 ,runningtheRDOprocessforanencodingunitdependsonagivenQP.Therefore,RDsamplepointsforthesameQPbutdifferentssimformalocalRDcurve,representingthelocalRDbehaviorundertheparticularQP.ForthefamilyoflocalRDcurvesoverdifferentQP,theyspananenvelopethatcloselyenclosethem,whichisapparentlytheRDboundoftheperceptualqualitymetric.InthelocalRDcurvettingmodule,eachlocalRDcurveisttedusingaquadraticcurve(circleforSSIM)withleastsquareregression.ThenforeverytwoneighboringRDcurves,acommontangentlineisderivedinthePiecewiseEnvelopeGenerationmoduletocapturethegradientoftheRDenvelopeatthatlocation.Notethatallthetangentlinesegmentsformapiece-wiseapproximationoftheRDenvelope. 5.3RDSamplingNoticethattheconventionalMSEbasedRDOisabletoproducevisuallypleasantcompression(bluelineinFig. 5-2 ).ThusitsRDOframeworkcanbeusedasastartingpointforndingthebestssimfortheperceptualqualitybasedRDO.WeproposearescalingmethodtoutilizetheMSEbasedRDO.ForSSIMbasedRDO,werescaletheoftheMSERDOintoavaluethatmatchesthedynamicrangeofDSSIM.Eq. 5 showsthetwoRDOframeworks.AfterencodingblocksusingtheexistingMSE-RDO,we 86

PAGE 87

getthestatisticsoftheaverageMSEmetricandSSIM-distortionmetricofablock.TheirratioisappliedtoscaletheSSIM-RDOLagrangemultiplier. minmodeDMSE(modejQP)+MSER(modejQP)minmodeDSSIM(modejQP)+SSIMR(modejQP)(5)wheressim=Dssim DmsemseInthisconguration,eachmodeproducessimilarratedistortiontrade-offfortwoRDOframeworksandthussimilarmodesshouldbechosenforbothRDOframeworks.Therefore,therescalingmethodhascomparableRDperformancewithMSEbasedRDOandalsoproducesvisuallypleasantcompression(purplelineinFig. 5-2 ).ThisfactsuggeststhatthebestssimforSSIMbasedRDOshouldbeintheneighborhoodoftherescaledmse.Inordertoincludethebestssim,wevarythessiminEq. 5 withoffsetintervalfrom-30%-200%andperformtheperceptualqualitybasedRDO.ForeachQP,multipleRDsamplepointsaregenerated[ 91 ],eachone(blacksamplepointsinFig. 5-2 )correspondstooffsettingssim.TheycomposealocalRDcurvethatdescribestheRDbehaviorofagivenQP.Also,overdifferentQP,afamilyoflocalRDcurvesisgeneratedtoreecttheglobalRDbehavior.Itsenvelope(redlineinFig. 5-2 )onthelowerleftsidedescribesthebestachievableRDbehaviorsinceeachRDpointsintheinteriorregionisatleastworsethantwoRDpoints(itshorizontalandverticalprojections)ontheboundary.AsshowninFig. 5-3 ,aninteriorRDpointRshasthesamedistortionwithRlontheenvelope,butneedslargerbitratesandalsohasthesamebiratewithRbontheenvelope,butresultsinlargerdistortion.Therefore,theenvelopecorrespondstothedesiredRDmodel.AsshowninFig. 5-3 ,thebestRDcurvebasedonperceptualqualitymetricRDOdoesnotnecessarilyparallelwiththatoftheMSEbasedRDO[ 86 ]. 87

PAGE 88

Figure5-2. Blackpointsinthesamemarker:RDsamplesforagivenQPbutvarying.Blackpointsindifferentmarker:RDsamplesfordifferentQP.Blueline:RDcurveresultedfromMSEbasedRDO.Purpleline:RDcurveresultedfromperceptualqualitybasedRDOwithscalingassociatedwithMSE-RDO.Redline:RDcurvethatistheboundofbestachievableperceptualqualitybasedRDtrade-off,enclosingallRDpoints. 5.4LocalRDCurveFittingSincethedesiredRDenvelopeisspannedbythefamilyoflocalcurves,weneedtoobtaintheanalyticformforeachlocalRDcurvebeforettingtheenvelopeout.BasedonthecharacteristicsofRDO,wecaninfersomegeometricpropertiesaboutthelocalcurve,asshowninFig. 5-4 ,1)Itismonotonouslydecreasing.Whenchoosingalargerssim,theRDOinEq. 5 toleratesdistortionbutpenalizebitrate,resultingaRDpointwithlessbitratebutlargerdistortion.SoifR1>R2,D1
PAGE 89

Figure5-3. TheRDperformancecomparisonofaninteriorRDpointwithtwoRDpointsontheRDenvelope. 2)Thelocalcurvestartsfromitslowerrightendandgoestoitsupperleftendasssimstartsfrom0andincreases.3)Thecurveisconvex.Whenthesampledssim=0,theRDOonlyconsidersdistortion.WhenRreachesavalue,thedistortionreachesitsbound(quantizationalwayslosessomeinformationandtherearedistortionalways)andcannotdecrease.SothelowerrightendofthelocalRDcurveapproximatesahorizontalline.Whenthesampledssimislargeenough,theRDOframeworkonlyconsiderbitrates.Whendistortionincreasesandreachesavalue,theratesconsumedcannotbereducedanymoresincetherewillbebitsusedtorepresentthevideosignal.Therefore,theupperleftendofthelocalcurveapproximatesaverticalline.Toconnecttheupper-leftendtothelower-rightendusingasmoothline,aconvexcurveisrequired.Fig. 5-5 showstheRDsamplesbyrunningmultipleSSIM-basedRDOonvideosequenceBusandMobile.ThesamplepointswiththesamemarkerbelongtothesameQPbutvaryingssim.ThesamplespointswithdifferentmarkerscorrespondtodifferentQP.BasedonthevisualizationoftheRDsamples,weuseaquadraticmodel(circleisthesimplest)totthelocalcurve(RDsampleswiththesamemarker).Thefunctional 89

PAGE 90

Figure5-4. ThegeometriccharacteristicsofalocalRDcurve. formofacircleisasinEq. 5 ,where(R,D)areavailableRDsamples,c,dandearecoefcientsofthequadraticcurvetot.WeperformleastsquareregressiontosolvethecoefcientsasinEq. 5 fc(R,Djc,d,e)=R2+D2+cR+dD+e(5) 0BBBB@RiDi11CCCCA0BBBB@cde1CCCCA=0BBBB@)]TJ /F3 11.955 Tf 9.3 0 Td[(R2i)]TJ /F3 11.955 Tf 11.95 0 Td[(D2i1CCCCA()Ax=b()x=(ATA))]TJ /F4 7.97 Tf 6.58 0 Td[(1ATb(5)Fig. 5-6 showstheperformanceofthecirclecurve(theblueline)ttedfromthelocalRDsamples(Theblackmarkers). 5.5PiecewiseEnvelopeGenerationWeproposeapiece-wiseapproximationmethodtottheglobalRDenvelope.TheideaisthattheglobalRDenvelopecanbeapproximatedbyafamilyofpiece-wiseline 90

PAGE 91

Figure5-5. RDsamplesoverdifferentQPandvaryingLagrangemultipliers,videoBusandMobile. segments(bluelineinFig. 5-7 ),eachofwhichisonthecommontangentlineoftwoneighboringlocalRDcurves.SincethelocalRDcurvebelongstoacircle,weusethefollowingproceduretondthecommontangentlineoftwocircles.Supposethetwocircleshavethefollowingform, 91

PAGE 92

Figure5-6. SamplesfromvideoBuswiththesameQP,varyingLagrangemultipliers.Circleisusedasthettingmodel.Clockwise:QP=23,24,25,26,respectively. (x)]TJ /F3 11.955 Tf 11.96 0 Td[(x1)2+(y)]TJ /F3 11.955 Tf 11.95 0 Td[(y1)2=r21(x)]TJ /F3 11.955 Tf 11.96 0 Td[(x2)2+(y)]TJ /F3 11.955 Tf 11.95 0 Td[(y2)2=r22(5)where(x1,y1)and(x2,y2)arethecentersofthetwocircles,r1andr2aretheradiusofthetwocircles.Supposeitscommontangentlineisax+by+c=0,thena=RX)]TJ /F3 11.955 Tf 13.24 0 Td[(kYp 1)]TJ /F3 11.955 Tf 11.96 0 Td[(R2,b=RY+kXp 1)]TJ /F3 11.955 Tf 11.96 0 Td[(R2,c=r1(ax1+by1),whereX=x2)]TJ /F10 7.97 Tf 6.59 0 Td[(x1 p (x2)]TJ /F10 7.97 Tf 6.59 0 Td[(x1)2+(y2)]TJ /F10 7.97 Tf 6.58 0 Td[(y1)2,Y=y2)]TJ /F10 7.97 Tf 6.59 0 Td[(y1 p (x2)]TJ /F10 7.97 Tf 6.59 0 Td[(x1)2+(y2)]TJ /F10 7.97 Tf 6.59 0 Td[(y1)2,R=r2)]TJ /F10 7.97 Tf 6.59 0 Td[(r1 p (x2)]TJ /F10 7.97 Tf 6.59 0 Td[(x1)2+(y2)]TJ /F10 7.97 Tf 6.59 0 Td[(y1)2.BasedonthettedlocalcurvesinSection 5.4 andtheprocedureabove,weobtainthecommontangentlineforeverytwoneighboringlocalRDcurves.Fig. 5-8 showsthe 92

PAGE 93

Figure5-7. ThepiecewiseapproximationofRDenvelopebytangentlinesegments. tangentlineovertwolocalRDcurves.ThetangentlinedemonstratesthegradientofglobalRDenvelopeatQPofthetwoneighboringlocalRDcurves.Therefore,thefamilyoftangentlinesformsanapproximationoftheglobalRDenvelope.Theintersectionsofeverytwoneighboringtangentlinesareonthepiecewiseenvelope.AsshowninFig. 5-10 ,theredmarkersaresuchintersectionpointsandtheyarewellpositionedontheenvelopethatenclosesallRDsamples(blackmarkers). 5.6ExperimentResultsWecomparetheRDperformanceofourproposedperceptualRDOwith[ 86 ]andbaselineJM16.0[ 78 ].Intheexperiment,weuseH.264baselineproletoencodeeachvideo.TheencodingcongurationsareGOPstructureIPPPwith12framesasagroup,RDOenabled,maximum3referenceframesforintercodingandQPfrom20to36.OurproposedframeworkisimplementedinJM16.0andwetestvevideosequencesincifsize(352x288).AsshowninFig. 5-10 ,theRDcurveofourmethodlocatesinthemostlowerleftamongthethreeRDcurves,whichmeansthatourmethodoutperformsboth[ 86 ]andthebaselineJM.Inthelowbitraterange(QP28-36),ourmethodhasalargesavingmargin,with5%and15%bitratereduction.Inthehighbitraterange 93

PAGE 94

Figure5-8. ThecommontangentlineoftwolocalRDcurves(circlesegment).Left:QP=26.Right:QP=28. (QP20-27),sincethebitrateislessprioritizedthandistortion,allthreemethodsmaychoosethemostcomplexmodestoencodeablock,whichresultsinsimilardistortion.However,ourmethodstillcansave2%and12%bitratefromthetworeferencemethods.Table. 5-1 showsthatthebitratereductionrateofourproposedmethodcomparingwiththetworeferencemethods. 94

PAGE 95

Table5-1. Bitratereduction(%)ofourproposedRDOforinter-framecodingunderBaselineprole.Courtesyofthevideotracelibrary[ 3 ]. VideosQP28-36QP20-27 Ref-[ 86 ]Ref-JMRef-[ 86 ]Ref-JM Foreman-5.42-15.29-3.84-12.2Coastguard-6.97-12.56-2.27-9.10Bus-9.78-16.6-3.86-7.59Mobile-7.63-11.21-3.58-8.42Akiyo-6.25-10.63-2.61-9.86 95

PAGE 96

Figure5-9. PiecewiselinearapproximationoftheRDenvelope.Left:Bus.Right:Coastguard. 96

PAGE 97

Figure5-10. PerceptualRDOperformancecomparison.Red:theproposedRDO.Cyan:theframeworkin[ 86 ].Blue:JM-MSEbasedRDO. 97

PAGE 98

CHAPTER6CONCLUSIONThisdissertationstudiesfouradvancedvideoprocessingtechniquesinvideotransmissionsystem.Thersttwotechniques,videoretargetingandvideosummarization,enhancetheadaptivitywhentransmittingvideocontentacrossdifferentplatforms.Theothertwotechniques,perceptualqualityassessmentandperceptualbasedencoderoptimization,exploithumanvisualcharacteristicstorecongurevideotransmissionsystemcomponents.InChapter 2 ,theauthorpresentsanovelvideoretargetingsystemthatissuitableforlongvideoswithgenericcontents.Differentfrommanyexistingapproachesthatfocusonvisualinterestingnesspreservation,theauthoridentiesthatthetemporalretargetingconsistencyandnon-deformationplayadominantrole.Also,thestatisticalstudyonhumanresponsetotheretargetingscaleshowsthatseekingaglobalretargetingviewasinheterogeneousapproachesisnotnecessary.Therenedhomogeneousapproachforthersttimecomprehensivelyaddressestheprioritizedtemporalretargetingconsistencyandachievesthemostpossiblepreservationofvisualinterestingness.Inparticular,theauthorproposesavolumeretargetingcostmetrictojointlyconsiderthetwoobjectivesandformulatedtheretargetingasanoptimizationproblemingraphrepresentation.Adynamicprogrammingsolutionisgiven.Tomeasurethevisualinterestingnessdistribution,theauthoralsointroducesanon-linearfusionbasedattentionmodeling.Encouragingresultshavebeenobtainedthroughtheimagerenderingoftheproposedattentionmodelingandvideoretargetingsystemonvariousimagesandvideos.Thesubjectiveteststatisticallydemonstratesthattheproposedattentionmodelcanmoreeffectivelyandefcientlyextractsalientregionsthanconventionalmethodsandalsothevideoretargetingsystemoutperformsotherhomogeneousmethods. 98

PAGE 99

InChapter 3 ,theauthoraimsatasummarythatguaranteesviewers'understandingofthevideoplot.Theauthorproposesanovelmethodtorepresentthesemanticstructureasconceptsandinstancesandarguedthatagoodsummarymaintainsallconceptscomplete,balancedandsalientlyillustrated.Itisshownintheexperimentstheeffectivenessoftheproposedconceptdetectionmethod.Also,thehumansubjectivetestjustiesthestrengthofthesummarizationmethod.InChapter 4 ,avisualpsychologicalfactorbasedperceptualqualitymodelisproposedtomeasurethevideoqualitydegradedbythepacketlosseventsduringtransmission.Themodelfocusesonexploringtwodominantpsychologicalfactorsthatdirectlycorrelatewithhumanresponseinalinearfashion.Fortherstfactortheglitteringblockeffects,theauthorproposesastructuraledgedetectionmethodtodescribeitsstrength.Forthesecondfactorthehumanfacedistortion,theauthorproposesamotionvectorentropybasedfacedeformationestimationmethodtodescribeitsvalue.Thentheauthoruseslinearregressionwithcrossvalidationtotraintheperceptualmodel.Theexperimentsshowthattheproposedmethodcorrelateswiththeactualhumanscoremorethantheconventionalbitstream/QoSfactorbasedapproachandalsocanincreasepredictionaccuracyby4%intermsofRRMSE.InChapter 5 ,aperceptualRDOframeworkbasedonpiecewiselinearapproximationisproposed.TheauthorstartsfromtheMSE-RDOandrescaleandoffsetitsLagrangemultipliertosuitforthedynamicrangeofperceptualdistortionintheproposedperceptualRDO.BasedonthecollectedRDsamples,theauthorndstheenvelopecurvecorrespondtothebestachieveRDmodel.TheauthorthenapproximatestheRDenvelopewithpiecewiselinesegments,eachsegmentisfromacommontangentlineoftwocirclesttedfromRDsamples.ExperimentsillustratethattheproposedRDOfeaturingRDenvelopeapproximationoutperformstheconventionalmethodsby2%to5%. 99

PAGE 100

REFERENCES [1] M.Rubinstein,D.Gutierrez,O.Sorkine,andA.Shamir,Acomparativestudyofimageretargeting,ACMTransactionsonGraphics,vol.29,pp.160,2010. [2] X.HouandL.Zhang,Saliencydetection:Aspectralresidualapproach,Proc.ofIEEEComputerVisionandPatternRecognization,vol.20,pp.1,2007. [3] Videotracelibrary,Yuvvideosequences,http://trace.eas.asu.edu/yuv/,2010. [4] CBSBroadcastingInc.,Thebigbangtheory,http://the-big-bang-theory.com/,2009. [5] theBlenderinstitute,Bigbuckbunny,http://www.bigbuckbunny.org/,2008. [6] CableNewsNetwork,Studentnews,http://www.cnn.com/studentnews/,2009. [7] S.Kanumuri,P.Cosman,A.Reibman,andV.Vaishampayan,Modelingpacket-lossvisibilityinmpeg-2video,IEEETransactionsonMultimedia,vol.8,pp.341,2006. [8] C.Guo,Q.Ma,andL.Zhang,Spatio-temporalsaliencydetectionusingphasespectrumofquaternionfouriertransform,Proc.ofIEEEComputerVisionandPatternRecognization,pp.1,2008. [9] L.Itti,C.Koch,andE.Niebur,Amodelofsaliency-basedvisualattentionforrapidsceneanalysis,IEEETransactionsonPatternAnalysisandMachineIntelligence,vol.20,pp.1254,1998. [10] X.HouandL.Zhang,Dynamicvisualattention:searchingforcodinglengthincrements,AdvancesinNeuralInformationProcessingSystems,vol.21,pp.681,2008. [11] D.WaltherandC.Koch,Modelingattentiontosalientproto-objects,NeuralNetworks,vol.19,pp.1395,2006. [12] Y.Ma,X.Hua,L.Lu,andH.Zhang,Agenericframeworkofuserattentionmodelanditsapplicationinvideosummarization,IEEETransactionsonMultimedia,vol.7,pp.907919,2005. [13] L.Wolf,M.Guttmann,andD.Cohen-Or,Non-homogeneouscontent-drivenvideo-retargeting,Proc.ofIEEEInternationalConferenceonComputerVision,vol.7,pp.16,2007. [14] S.AvidanandA.Shamir,Seamcarvingforcontent-awareimageresizing,ACMTransactionsonGraphics,vol.26,pp.10,2007. [15] F.LiuandM.Gleicher,Videoretargeting:automatingpanandscan,Proc.ofACMMultimedia,pp.241,2006. 100

PAGE 101

[16] G.Hua,C.Zhang,Z.Liu,Z.Zhang,andY.Shan,Efcientscale-spacespatiotemporalsaliencytrackingfordistortion-freevideoretargeting,ComputerVision-ACCV2009,pp.182,2010. [17] T.Deselaers,P.Dreuw,andH.Ney,Pan,zoom,scantime-coherent,trainedautomaticvideocropping,Proc.ofIEEEComputerVisionandPatternRecogniza-tion,pp.1,2008. [18] M.Rubinstein,A.Shamir,andS.Avidan,Improvedseamcarvingforvideoretargeting,ACMTransactionsonGraphics,vol.27,pp.1,2008. [19] M.Grundmann,V.Kwatra,M.Han,andI.Essa,Discontinuousseam-carvingforvideoretargeting,Proc.ofIEEEComputerVisionandPatternRecognition,pp.569,2010. [20] M.Rubinstein,A.Shamir,andS.Avidan,Multi-operatormediaretargeting,ACMTransactionsonGraphics,vol.28,pp.23:1:11,2009. [21] Y.S.Wang,C.L.Tai,O.Sorkine,andT.Y.Lee,Optimizedscale-and-stretchforimageresizing,ACMTransactionsonGraphics,vol.27,pp.1,2008. [22] Y.S.Wang,H.Fu,O.Sorkine,T.Y.Lee,andH.P.Seidel,Motion-awaretemporalcoherenceforvideoresizing,ACMTransactionsonGraphics,vol.28,pp.127,2009. [23] P.Krahenbuhl,M.Lang,A.Hornung,andM.Gross,Asystemforretargetingofstreamingvideo,ACMTransactionsonGraphics,vol.28,pp.126:1:10,2009. [24] J.S.BoreczkyandL.A.Rowe,Comparisonofvideoshotboundarydetectiontechniques,JournalofElectronicImaging,vol.5,pp.122,1996. [25] N.V.PatelandI.K.Sethi,Videoshotdetectionandcharacterizationforvideodatabases,PatternRecognition,vol.30,pp.583,1997. [26] T.A.EllandS.J.Sangwine,Hypercomplexfouriertransformsofcolorimages,IEEETransactionsonImageProcessing,vol.16,pp.22,2007. [27] J.ShiandC.Tomasi,Goodfeaturestotrack,Proc.ofIEEEComputerVisionandPatternRecognization,vol.29,pp.593,1994. [28] P.J.RousseeuwandA.M.Leroy,Robustregressionandoutlierdetection,Wiley-IEEE,vol.589,2005. [29] DreamWorksAnimation,Madagascar,http://www.madagascar-themovie.com/main.php,2005. [30] S.Goferman,L.Zelnik-Manor,andA.Tal,Context-awaresaliencydetection,IEEETransactionsonPatternAnalysisandMachineIntelligence,vol.34,pp.1915,2012. 101

PAGE 102

[31] M.Cheng,G.Zhang,N.Mitra,X.Huang,andS.Hu,Globalcontrastbasedsalientregiondetection,IEEEConferenceonComputerVisionandPatternRecognition,pp.409,2011. [32] R.Rensink,Seeing,sensingandscrutinizing,VisionResearch,vol.40,pp.1469,2000. [33] MichaelKors,runwaymodel,http://www.youtube.com/watch?v=QvKHuZfJzN0,2010. [34] Y.Li,S.Lee,andetal.,Techniquesformoviecontentanalysisandskimming,IEEESignalProcessingMagazine,vol.23,pp.79,2006. [35] Y.Li,T.Zhang,andetal.,Anoverviewofvideoabstractiontechniques,HPLaboratoriesPaloAlto,Tech.ReportNo.HPL-2001-191s,2001. [36] D.DementhonandD.Doermann,Videosummarizationbycurvesimplication,ACMInterntionalConferenceonMultimedia,pp.211,1998. [37] Y.GongandX.Liu,Videosummarizationusingsingularvaluedecomposition,IEEEInternationalConferenceonComputerVisionandPatternRecognition,pp.174,2000. [38] M.Padmavathi,R.Yong,andet.al,Keyframe-basedvideosummarizationusingdelaunayclustering,InternationalJournalonDigitalLibraries,vol.6,pp.219,2006. [39] Z.CernekovaandIoannisPitas,Informationtheory-basedshotcut/fadedetectionandvideosummarization,IEEETransactionsonCircularSystemsforVideoTechnology,vol.16,pp.82,2006. [40] YuxinPengandChong-WahNgo,Clip-basedsimilaritymeasureforquery-dependentclipretrievalandvideosummarization,IEEETransactionsonCircularSystemsforVideoTechnology,vol.16,pp.612,2006. [41] ZhuLi,GuidoM.Schuster,andet.al,Minmaxoptimalvideosummarization,IEEETransactionsonCircuitsSystemsforVideoTechnology,vol.15,pp.1245,2005. [42] Y.F.Ma,X.S.Hua,andetal.,Agenericframeworkofuserattentionmodelanditsapplicationinvideosummarization,IEEETransactionsonMultimedia,vol.7,pp.907,2005. [43] C.W.Ngo,Y.F.Ma,andetal.,Videosummarizationandscenedetectionbygraphmodeling,IEEETransactionsonCircuitsSystemsforVideoTechnology,vol.15,pp.296,2005. 102

PAGE 103

[44] G.Evangelopoulos,A.Zlatintsi,andetal.,Videoeventdetectionandsummarizationusingaudio,visualandtextsaliency,IEEEInternationalCon-ferenceonAcousticsSpeechandSignalProcessing,pp.3553,2009. [45] Y.F.Ma,L.Lu,andetal.,Auserattentionmodelforvideosummarization,ACMConferenceonMultimedia,pp.533,2002. [46] L.Itti,C.Koch,andetal.,Amodelofsaliency-basedvisualattentionforrapidsceneanalysis,IEEETransactionsonPatternAnalysisandMachineIntelligence,vol.20,pp.1254,1998. [47] T.Lu,Z.Yuan,andetal.,Videoretargetingwithnonlinearspatial-temporalsaliencyfusion,IEEEInternationalConferenceonImageProcessing,pp.1801,2010. [48] T.Wang,Y.Gao,andetal.,Videosummarizationbyredundancyremovingandcontentranking,ACMInternationalConferenceonMultimedia,pp.577,2007. [49] D.Lewis,Naivebayesatforty:Theindependenceassumptionininformationretrieval,MachineLearning,Springer,pp.4,1998. [50] D.G.Lowe,Objectrecognitionfromlocalscale-invariantfeatures,IEEEInterna-tionalConferenceonComputerVision,pp.1150,1999. [51] U.VonLuxburg,Atutorialonspectralclustering,StatisticsandComputing,Springer,vol.17,pp.395,2007. [52] L.HerranzandJ.M.Martinez,Aframeworkforscalablesummarizationofvideo,IEEETransactionsonCircuitSystemsforVideoTechnology,vol.20,pp.1265,2010. [53] J.Asghar,F.LeFaucheur,andI.Hood,Preservingvideoqualityiniptvnetworks,IEEETransactionsonBroadcasting,vol.55,pp.386,2009. [54] InternationalTelecommunicationUnion(ITU),Perceptualvisualqualitymeasurementtechniquesformultimediaservicesoverdigitalcabletelevisionnetworksinthepresenceofareducedbandwidthreference,ITU-TRecommenda-tionJ.246,2008. [55] M.ClaypoolandJ.Tanner,Theeffectsofjitterontheperceptualqualityofvideo,ACMInternationalConferenceonMultimedia-Part2,pp.115,1999. [56] Objectiveperceptualmultimediavideoqualitymeasurementinthepresenceofafullreference,InternationalTelecommunication,vol.J.341,2011. [57] I.P.GunawanandM.Ghanbari,Reduced-referencevideoqualityassessmentusingdiscriminativelocalharmonicstrengthwithmotionconsideration,IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.18,pp.71,2008. 103

PAGE 104

[58] S.Argyropoulos,A.Raake,M.Garcia,andP.List,No-referencevideoqualityassessmentforsdandhdh.264/avcsequencesbasedoncontinuousestimatesofpacketlossvisibility,Proc.3rdInternationalWorkshopQualityofMultimediaExperience(QoMEX),pp.31,2011. [59] S.WinklerandP.Mohandas,Theevolutionofvideoqualitymeasurement:Frompsnrtohybridmetrics,IEEETransactionsonBroadcasting,vol.54,pp.660,2008. [60] T.L.Lin,S.Kanumuri,Y.Zhi,D.Poole,P.Cosman,andA.Reibman,Aversatilemodelforpacketlossvisibilityanditsapplicationtopacketprioritization,IEEETransactionsonImageProcessing,vol.19,pp.722,2010. [61] F.Pan,X.Lin,S.Rahardja,W.Lin,E.Ong,S.Yao,Z.Lu,andX.Yang,Alocally-adaptivealgorithmformeasuringblockingartifactsinimagesandvideos,ProceedingsoftheInternationalSymposiumonCircuitsandSystems,vol.3,pp.23,2004. [62] TaoLiu,YaoWang,J.M.Boyce,HuaYang,andZhenyuWu,Anovelvideoqualitymetricforlowbit-ratevideoconsideringbothcodingandpacket-lossartifacts,IEEEJournalofSelectedTopicsinSignalProcessing,vol.3,pp.280,2009. [63] Y.J.Liang,J.G.Apostolopoulos,andB.Girod,Analysisofpacketlossforcompressedvideo:Effectofburstlossesandcorrelationbetweenerrorframes,IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.18,pp.861,2008. [64] N.Staelens,G.VanWallendael,andetal.,No-referencebitstream-basedvisualqualityimpairmentdetectionforhighdenitionh.264/avcencodedvideosequences,IEEETransactionsonBroadcasting,vol.58,pp.187199,2012. [65] VideoQualityExpertsGroup(VQEG),Reportonthevalidationofvideoqualitymodelsforhighdenitionvideocontent,IEEESignalProcessingMagazine[Online].Available:http://www.its.bldrdoc.gov/vqeg/projects/hdtv/,pp.577,2010. [66] Livemobilequalitydatabase,http://live.ece.utexas.edu/research/Quality/index.htm,2012. [67] Xiph.orgvideotestmedia,http://media.xiph.org/video/derf/,2012. [68] Vqeghybridtestplan,http://www.its.bldrdoc.gov/vqeg/projects/hybrid-perceptual-bitstream/hybrid-perceptual-bitstream.aspx,2012. [69] PaulViolaandMichaelJ.Jones,Robustreal-timefacedetection,InternationalJournalofComputerVision,vol.57,pp.137,2004. [70] Weka,http://www.cs.waikato.ac.nz/ml/weka/index downloading.html,2012. 104

PAGE 105

[71] GaryJ.SullivanandThomasWiegand,Rate-distortionoptimizationforvideocompression,IEEESignalProcessingMagazine,vol.15,pp.74,1998. [72] TihaoChiangandYa-QinZhang,Anewratecontrolschemeusingquadraticratedistortionmodel,IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.7,pp.246,1997. [73] Hung-JuLee,TihaoChiang,andYa-QinZhang,Scalableratecontrolformpeg-4video,IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.10,pp.878,2000. [74] Liang-JinLinandAntonioOrtega,Bit-ratecontrolusingpiecewiseapproximatedrate-distortioncharacteristics,IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.8,pp.446,1998. [75] P.H.Westerink,R.Rajagopalan,andC.A.Gonzales,Two-passmpeg-2variable-bit-rateencoding,IBMJournalofResearchandDevelopment,vol.43,pp.471,1999. [76] ChengshengQue,GuobinChen,andJilinLiu,Anefcienttwo-passvbrencodingalgorithmforh.264,IEEEInternationalConferenceonCommunications,CircuitsandSystems,vol.1,pp.118,2006. [77] DimitriP.Bertsekas,Constrainedoptimizationandlagrangemultipliermethods,ComputerScienceandAppliedMathematics,Boston,1982. [78] H.264/avcsoftwarejm16.0,http://iphome.hhi.de/suehring/tml/,2012. [79] ThomasWiegandandetal.,Overviewoftheh.264/avcvideocodingstandard,IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.13,pp.560,2003. [80] KalvaHariandLakisChristodoulou,Usingmachinelearningforfastintrambcodinginh.264,ElectronicImaging2007,2007. [81] SiweiMa,WenGao,andYanLu,Rate-distortionanalysisforh.264/avcvideocodinganditsapplicationtoratecontrol,IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.15,pp.1533,2005. [82] Z.WangandA.C.Bovik,Meansquarederror:Loveitorleaveit?anewlookatsignaldelitymeasures,IEEESignalProcessingMagazine,vol.26,pp.98,2009. [83] WinklerStefan,AnimeshSharma,andDavidMcNally,Perceptualvideoqualityandblockinessmetricsformultimediastreamingapplications,Proc.InternationalSymposiumonWirelessPersonalMultimediaCommunications,pp.547,2001. [84] N.Staelens,S.Moens,W.VandenBroeck,I.Marien,B.Vermeulen,P.Lambert,R.VandeWalle,andP.Demeester,Assessingqualityofexperienceofiptv 105

PAGE 106

andvideoondemandservicesinreal-lifeenvironments,IEEETransactionsonBroadcasting,vol.56,pp.458,2010. [85] Z.Wangandetal.,Imagequalityassessment:Fromerrorvisibilitytostructuralsimilarity,IEEETransactionsonImageProcessing,vol.13,pp.600,2004. [86] Yi-HsinHuangandetal.,Perceptualrate-distortionoptimizationusingstructuralsimilarityindexasqualitymetric,IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.20,pp.1614,2010. [87] C.Wang,X.Mou,andL.Zhang,Aperceptualqualitybasedratedistortionmodel,2012FourthInternationalWorkshoponQualityofMultimediaExperience,pp.74,2012. [88] S.Wangandetal,Ssim-motivatedrate-distortionoptimizationforvideocoding,IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.22,pp.516,2012. [89] X.Wang,L.Su,Q.Huang,andC.Liu,Visualperceptionbasedlagrangianratedistortionoptimizationforvideocoding,IEEEInternationalConferenceonImageProcessing,pp.1653,2011. [90] G.Valenzise,S.Magni,M.Tagliasacchi,andS.Tubaro,No-referencepixelvideoqualitymonitoringofchannel-induceddistortion,IEEETransactionsonCircuitsandSystemsforVideoTechnology,vol.22,pp.605,2012. [91] BenJamesWiner,DonaldR.Brown,andM.MichelsKenneth,Statisticalprinciplesinexperimentaldesign,McGraw-Hill,1991. 106

PAGE 107

BIOGRAPHICALSKETCH ZhengYuanwasbornin1984inZhangjiakou,HebeiProvince,China.HereceivedhisB.S.degreeinelectronicandinformationengineeringatJilinUniversity,Changchun,Chinain2006andtheM.S.degreeinelectronicinformationandelectricalengineeringatShanghaiJiaoTongUniversity,Chinain2009.HereceivedhisPh.D.degreeinelectricalandcomputerengineeringattheUniversityofFlorida,Gainesville,USAinthesummerof2013.Hisresearchinterestsincludevideoandimageanalysis/processing,videocompression,multimedia,computervisionandmachinelearning. 107