<%BANNER%>

Scheduling Online Advertisements Using Information Retrieval and Neural Network/Genetic Algorithm Based Metaheuristics


PAGE 1

SCHEDULING ONLINE ADVERTISEMENTS USING INFORMATION RETRIEVAL AND NEURAL NETWORK/GENETIC ALGORITHM BASED METAHEURISTICS By JASON DEANE A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLOR IDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2006

PAGE 2

Copyright 2006 by Jason Deane

PAGE 3

This document is dedicated to my entire fam ily, but especially to Amanda, Conner, mom and pawpaw. Amanda and Conner provided me with the necessary strength, determination and never ending support a nd were my inspiration in pursuing and finishing my PhD. Mom and pawpaw are, without a doubt, the two most influential people in my life. For good and for bad, everyt hing that I am is as a result of my never ending effort to model myself after these two amazing people. Pawpaw was the kindest and most sincere person that I have ever met and although hes in a better place now, I still think of him every day. My mother is the strongest and hardest working person that I know and without her many sacrifices, my life could have been completely different and I would never have had the opportunity to achieve this goal. Thank you!

PAGE 4

iv ACKNOWLEDGMENTS I would like to especially thank my wife Amanda for supporting and putting up with me throughout this process. I know that it was not easy. I w ould also like to thank our families for their never ending support throug hout this very challenging endeavor. In addition, I would like to thank my disserta tion committee and the DIS department staff for their support and guidance. In particul ar I would like to th ank and acknowledge my advisor, Anurag Agarwal, and my co-chair, Praveen Pathak, for their countless hours of training and support. I coul dnt have done it without you!!

PAGE 5

v TABLE OF CONTENTS page ACKNOWLEDGMENTS.................................................................................................iv LIST OF TABLES............................................................................................................vii LIST OF FIGURES...........................................................................................................ix ABSTRACT....................................................................................................................... ..x CHAPTER 1 INTRODUCTION AND MOTIVATION....................................................................1 2 ONLINE ADVERTISING............................................................................................5 2.1 Definitions and Pricing Models..............................................................................5 2.2 Literature Review..................................................................................................7 3 INFORMATION RETRIEVAL METHODOLOGIES..............................................27 3.1 Overview...............................................................................................................27 3.2 Data Pre-processing..............................................................................................30 3.3 Vector Space Model.............................................................................................31 3.4 Structural Representation.....................................................................................34 3.5 WordNet...............................................................................................................35 4 LARGE SCALE SEARCH METHODOLOGIES.....................................................38 4.1 Overview...............................................................................................................38 4.2 Genetic Algorithms...............................................................................................43 4.3 Neural Networks...................................................................................................47 4.4 The No Free Lunch Theorem...............................................................................52 5 RESEARCH MODEL(S)...........................................................................................54 5.1 Problem Summary................................................................................................54 5.2 Information Retrieval Based Ad Targeting..........................................................56 5.3 Online Advertisement Scheduling.......................................................................64 5.3.1 The Modified Maxspace Problem (MMS).................................................65 5.3.2 The Modified Maxspace Problem with Ad Targeting (MMSwAT)..........68

PAGE 6

vi 5.3.3 The Modified Maxspace Problem with Non-Linear Pricing (MMSwNLP)...................................................................................................72 5.4 Model Solution Approaches.................................................................................74 5.4.1 Augmented Neural Network (AugNN)......................................................74 5.4.2 Genetic Algorithm (GA).............................................................................78 5.4.3 Hybrid Technique.......................................................................................82 5.4.4 Parameter Selection....................................................................................83 5.5 Problem set Development.....................................................................................84 6 RESULTS...................................................................................................................86 6.1 Information Retrieval Based Ad Targeting Results..............................................86 6.2 Discussion of the Information Retrieval Based Ad Targeting Results...............101 6.3 Online Advertisement Scheduling Results.........................................................102 6.2.1 Modified Maxspace (MMS) Problem Result...........................................104 6.2.2 The Modified Maxspace with Ad Targeting (MMSwAT) Problem Results............................................................................................................105 6.2.3 The Modified Maxspace wint Non linear Pricing (MMSwNLP) Problem Results............................................................................................................107 6.3 Discussion of the Online Advertisement Scheduling Results............................109 7 SUMMARY, CONCLUSIONS A ND FUTURE RESEARCH................................110 APPENDIX A GA AND AUGNN PARAMETER A ND SETTING DEFINITIONS.....................113 B LIST OF ADVERTISED PRODUC TS AND SERVIC ES AND THEIR RESPECTIVE CHARACTERISTIC ARRAYS......................................................114 C SAMPLE DOCUMENTS FOR ONE USER FROM THE IR BASED AD TARGETING PROCESS.........................................................................................119 LIST OF REFERENCES.................................................................................................149 BIOGRAPHICAL SKETCH...........................................................................................159

PAGE 7

vii LIST OF TABLES Table page 1 Structural Element Weighting Schemes...................................................................60 2 AugNN Parameter Values........................................................................................84 3 GA Parameter Values...............................................................................................84 4 Hybrid Parameter Values.........................................................................................84 5 Summary of Mean Student Ranki ngs for the 4 Selection Methods.........................88 6 T Test-Scheme 1 & Random Selection....................................................................89 7 T Test-Scheme 2 & Random Selection....................................................................89 8 T Test-Scheme 3 & Random Selection....................................................................90 9 Summary of Mean Student Rankings for the Three Weighting Schemes................91 10 T Test-Scheme 1-5 & Scheme 1-1...........................................................................92 11 T Test-Scheme 1-5 & Scheme 1-2...........................................................................92 12 T Test-Scheme 1-5 & Scheme 1-3...........................................................................93 13 T Test-Scheme 1-5 & Scheme 1-4...........................................................................93 14 T Test-Scheme 2-5 & Scheme 2-1...........................................................................94 15 T Test-Scheme 2-5 & Scheme 2-2...........................................................................94 16 T Test-Scheme 2-5 & Scheme 2-3...........................................................................95 17 T Test-Scheme 2-5 & Scheme 2-4...........................................................................95 18 T Test-Scheme 3-5 & Scheme 3-1...........................................................................96 19 T Test-Scheme 3-5 & Scheme 3-2...........................................................................96 20 T Test-Scheme 3-5 & Scheme 3-3...........................................................................97

PAGE 8

viii 21 T Test-Scheme 3-5 & Scheme 3-4...........................................................................97 22 T Test-Scheme 1 & Scheme 2.................................................................................99 23 T Test-Scheme 1 & Scheme 3.................................................................................99 24 T Test-Scheme 2 & Scheme 3................................................................................100 25 Problem Results......................................................................................................104 26 MMSwAT Comparison of Results.........................................................................105 27 MMSwNLP Comparison of Results......................................................................107

PAGE 9

ix LIST OF FIGURES Figure page 1 A Screen Print of Yahoos Shopping Page Notice the advertising banner down the right hand side of the Web page.........................................................................10 2 Pictorial Representation of Information Flow in Traditional Print Advertising......18 3 Pictorial Representation of the Info rmation Flow in Online Advertising................19 4 Geometric Representation of the VSM....................................................................32 5 Classes of Search Methods (Basic Model Borrowed from [54]).............................40 6 Pictorial Representation of the Cerebral Cortex [91]..............................................48 7 Pictorial Representation of a Basic Feed Forward ANN [91]..................................50 8 Selected Parents Prior to Crossover.........................................................................80 9 Resulting Offspring..................................................................................................81 10 Child 2 Prior to Mutation.........................................................................................81 11 Child 2 After Mutation.............................................................................................81 12 Q-Q Plot of Student Response Values.....................................................................87

PAGE 10

x Abstract of Dissertation Pres ented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy SCHEDULING ONLINE ADVERTISEMENTS USING INFORMATION RETRIEVAL AND NEURAL NETWORK/GENETIC ALGORITHM BASED METAHEURISTICS By Jason Deane August, 2006 Chair: Anurag Agarwal Cochair: Praveen Pathak Major Department: Decision and Information Sciences As a result of the recent technological pr oliferation, online advertising has become a very powerful and popular method of marke ting; industry revenue is growing at a record pace. One very challenging problem which is faced by those on the publishing side of the industry is ad targeting. In an attempt to maximize revenue, publishers try their best to expose web surfer s to a set of advertisements which are closely aligned with their interests and needs. In this work, we present and test an information retrieval based ad targeting technique which shows promise as an alternat ive solution method for this problem. A second, very difficult, challe nge faced by online ad publishers is the development of an ad schedule which makes the most efficient use of their available advertisement space. We introduce three versions of this very difficult problem and test several potential solution techniques for each of them.

PAGE 11

1 CHAPTER 1 INTRODUCTION AND MOTIVATION Despite residual fears from the dot-com decline of 2000, many seem to be once again embracing the Web. Worldwide Internet usage is at an all time high, broadband access is soaring and many households are turning away from their tele visions in lieu of their computer screens [1]. The prolifer ation of the fiber opt ic telecommunication infrastructure which was left over from the telecom boom of the 1990s has made broadband connectivity accessible and affordable for almost any family. As a result, the online experience has been vastly improved and is extremely popular with the technology generation. According to Tom Hyland, Partner and New Media Group Chair, PricewaterhouseCoopers, this ha s created a mass audience of Internet users which simply cannot be ignored by advertiser s. Corporate America is be ginning to reali ze the potential importance of expanding its advertising portfol io to include the online channel. This sentiment is echoed by many corporate executives. David Garrity, a financial analyst at Caris & Company, a Wall Street in vestment firm, asserts that "Every indication is that corporate advertising budgets are increasingl y allocated to the Internet" [2, p.1]. Ty Montague, Wieden & Kennedy's chief creativ e officer, believes "Whereas people are zapping most TV advertising, the Net is amazi ng for drawing people in, if our ingenuity is up to it" [3, p.1]. These comments are typica l of the current claims about the growth of the online advertising channel. The recent trend in online advertisement spending fully supports these claims. According to a recent Price Waterhouse Coopers report, industry revenue for the calendar

PAGE 12

2 year 2005 totaled $12.5 billion which represents a 23% ye ar over year increase in comparison with 2004 results [4]. Industry wide revenue has increased in 12 of the last 13 quarters. In addition, future projections of widespread mobile Internet access demand are expected to provide an addi tional revenue boost for the indu stry. It is estimated that online advertisement revenues for the US alone will grow to $18.9 billion by 2010 [4]. Motivated by this upward sloping trend in Internet advertising demand, many companies (e.g., Google, Yahoo, AOL, etc.) have adopted a business model which is heavily dependent upon the revenue stream gene rated from their publishing of online advertisements [5]. As a result, efforts to improve the online a dvertisement scheduling process are under extreme demand. In personal conversations w ith Doron Welsey, Director of Industry Research for the Interactive Advertising Bureau (IAB), and Rick Bruner, research analyst for DoubleClick, a leading online advertising agen cy, both indicated that they have been inundated with companies seeking help with th eir online advertising efforts. They also indicated that research which helps overcome the IT-related ch allenges that currently face the industry is critically needed and therefore is likely to be important to industry experts and academicians alike. In this dissertation, we apply information retrieval and artificial intelligence methodologies in an attempt to provide effi cient, appealing solution alternatives to one of the most difficu lt and compelling problems facing publishers, online banner advertisement scheduling. Gi ven the popularity of banner advertising and the considerable revenue which it generates, even a small improvement in the efficiency and/or quality of the scheduling process c ould result in a considerable increase in revenue.

PAGE 13

3 The goals of this thesis are three fold. First, we propose a methodology which, based on a users recent Web surfing behavior, provides an estimate of his or her level of interest in a particular ad vertisement. Second, we in troduce three new real-world variations of the strongly NP-hard online ad vertisement scheduling optimization problem. Finally, we develop and test several heuristic and meta-heuristic solution algorithms for each of the new models that we propose. Information Retrieval (IR) is an area of research which attempts to extract usable information from textual data. We propose a method by which Information retrieval and ontological methodologies are utiliz ed to exploit a users recen t Web surfing history in an effort to categorize ads based on the users pred icted level of interest. IR has historically been employed in the field of library scienc es, but it has recently gained favor in many other fields including Internet s earch and cyber security. The pow er of IR is its ability to handle textual information. Information retr ieval has been applied in many domains, including document sorting, document retr ieval, inference development and query response. We use IR techniques to leverage the textual representation of a users html Web surfing history in the creation of a weight ed characteristic array for each user. We create similar arrays for each advertisemen t and use several similarity measures to strategically create a schedule of user-advertisement assignments. The basic online advertisement schedu ling optimization problem has been addressed in the literature. Because it is an NP-hard problem [6], mo st of the variations have been limited to linear pricing models which seek to maximize the number of ads served or the number of times an ad is clicked. We introduce several new model

PAGE 14

4 variations designed to address realistic issues such as nonlinear prici ng and advertisement targeting. Obviously, the NP-hard nature of the basic linear problem means that these variations will be even more difficult to solve optimally. We develop and test several heuristic algorithms which may allow efficient generation of near-optimal solutions for these models. Machine learning (ML) is the study of computer algorithms that improve automatically through experience [7]. Mach ine learning is a subs et of artificial intelligence which has received considerable attention due to the recent increase in available computing power. Machine learni ng methods such as decision trees, logit functions, and neural networks have been applied successfully to a wide array of problems, including optimization problems, a nd have therefore proven to be valuable tools in the development of heuristic solution approaches. We combine neural network and genetic algorithm techniques with several ba se heuristics in an effort to provide efficient robust solution techniques for multip le variations of the online advertisement scheduling problem.

PAGE 15

5 CHAPTER 2 ONLINE ADVERTISING This chapter presents a general overview of the online advertising industry and the associated research. In s ection 2.1, we provide a review of the basic definitions and pricing models. In section 2.2, we provide a review of the online advertisement scheduling literature. 2.1 Definitions and Pricing Models There are three primary particip ants in online adve rtising. At the top of the chain is the advertiser. This is a compa ny that enters into an agreement with a publisher in order to enlist the publishers assistance in the se rving of their online advertisements. More times than not, the ads are delivered to user s of the publishers Web pages. The publisher is a company that expends resources in an effort to publish online advertisements in an effort to generate revenue. The customer is the individual who browses Web pages and may or may not respond to an ad in a manner th at is verifiable, such as clicking the ad. Publishers could be paid by the advertiser s for their service a ccording to a number of possible schemes. The first category of pricing models is often referred to as Impression Based Pricing Models because the publisher is pa id entirely on the basis of serving the ad, which is called an impr ession on the Webpage, and not on any action taken by the customer. Thus, the publisher is paid whether or not the customer shows any interest in the ad. The most basic impression based model is CPM Linear Pricing. CPM is short for cost per mille (mille is Lati n for 1,000). In this scheme, the publisher is paid a fixed fee for each 1,000 ads that are serv ed. The fee is based on the size of the ad

PAGE 16

6 and increases in a linear fashion. In addi tion, the rate may be different depending on the chosen Web pages (sports, news, etc.), th e time of day, etc.; however, many publishers price each slot identically in an effort to simplify the accounting and scheduling operations. Larger ads decrease the publishers flexibility to schedule ads within a fixed banner area; therefore, publishers might expect a premium for larger ads. CPM Nonlinear Pricing allows for this expectation. It is the same as the CPM Linear Pricing except that the pricing function with respect to advertisement size is either a concave or a step function in stead of a linear function. The third type of CPM model is called Modified CPM This is a model which is being used by publishers in an effort to in crease the revenue which they receive for the advertising space on their generic/non-target ed Web pages. Advertising space for the targeted pages such as sports, automotive, a nd real estate is in high demand; however, the space on the other non-targeted pages is much hard er to sell. As a result, publishers have started trying to charge a premium for the advertising space on th ese non-targeted pages by employing consumer classificatio n. The basic idea is that a user is classified based on his or her click behavior and then served ads based on this classification. As an example, a user that visits the sports page more than some threshold number of times is classified as a sports person. The publisher then targets this consumer wh en he or she visits one of the non-targeted pages and serves the consumer a sports relate d ad. The revenue that the publisher is able to demand in this situation is not as high as it would have been had he or she served the ad on the sports page, but it is higher than he or she could have received for another random ad placement on the non-targeted page. Under all of the CPM based models, the advertiser b ears all of the financial risk. This is because the advertiser must

PAGE 17

7 pay the publisher the agreed upon rate regardless of how well the advertisement campaign performs. The other primary category of pricing models, in c ontrast to impression-based models, is Performance Based Models. These are models within which the publisher is paid based solely on some pre-defined measure of ad campaign performance. Performance Based CPC (Cost Per Click) is a scheme in which the publisher is paid a fee each time the advertisers ad is clicked. Performance Based CPS (Cost Per Sale) is where the publisher is paid a fee each time one of the served ads results in a sale. In Performance Based CPR (Cost per Registration) the publisher is paid a fee each time a consumer sets up an account with the advertis er as a result of the advertisement. Under all of the performance based models, the publisher bears all of the financ ial risk. This is because the publisher is paid nothing for simp ly publishing the advertisements. Instead, he or she is paid only if pre-define d performance criteria are met. Finally, Hybrid Pricing Models are pricing models which combine two or more of the above models. Often, this type of model will include the CPM and one or more of the performance based models in an effort to es tablish an equitable risk sharing situation between the publisher and the a dvertiser. These have become very popular in industry. 2.2 Literature Review The process of scheduling online advertis ements can be a very challenging and dynamic task which is characterized by a wide a rray of obstacles and constraints. The set of constraints and difficulties differs vastly from publishe r to publisher, depending on their effort level and the methods with wh ich they choose to address the problem. Factors which affect the relative complexity level of the problem include which pricing models the publisher chooses; which, if any, targeting efforts the publisher attempts to

PAGE 18

8 employ, and which additional artificial inte lligence techniques the publisher chooses to build into their scheduling algorithms. Thus far, the primary focus of academic researchers has been on addressing the most basic of these situations: a CPM pricing model with no applied intel ligence or targeting. From a publishers perspective, adver tising space is a precious non-renewable resource which, if used efficiently, can driv e both current revenues and future demand. The two primary goals, from an advertisement scheduling perspective, are to minimize the amount of unused advertising space and to maximize the probability that a customer will have interest in the advertisements wh ich he or she is served. Depending on the agreed upon pricing model, these goals can take on different levels of importance with respect to the maximization of revenue. P ublishers are currently compensated based on some combination of the amount of Web space and number of advertisement impressions delivered for an advertiser, and/or their sc ore on a pre-defined set of performance based measures such as the number of clicks, numbe r of leads, or the number of sales. The original pricing model for th e online advertising industry wa s the CPM model. The CPM model is a basic pricing stru cture which was adopted from traditional print advertising, within which the publisher is compensate d an agreed upon rate for every thousand advertisement impressions that they deliver. This model was very popular in the 1990s and is still being used by many companies [8]. Thus far academic research literature has been primarily focused on models which are based on this pricing strategy. The seminal online advertisement scheduling paper by Adler, Gibbons and Matias [6] introduced two basic problems, the Minspace and the Maxspace problem, and proved that both are NP-hard in the strong sense. The Maxspace problem is formulated based on

PAGE 19

9 the CPM pricing model. The objective of the Maxspace problem is to find a feasible schedule of ads such that the total occupied slot space is maximized given that the slots have a fixed capacity and the ads are of diffe ring sizes and differing display frequencies. There are several assumptions wh ich are inherent in the formul ation of these two models. First, it is assumed that each banner/time slot is the same size, S. Second, it is assumed that all of the ads have the same width, which is equal to the width of the banner. This is common practice when the banner ad space is found on either side of a Web page. See figure 1 for an example from Yahoos shopping pa ge. Next, it is assumed that each ad has a height which is less than or equal to the height of the banner. It is also assumed that any user who accesses the Web site during a given time slot will see the same set of advertisements. In addition, the authors assume that there is a positive linear relationship between advertisement size and the revenue which is generate d. Therefore, the objective is to find a feasible set of ads which maximizes the used advertising space. An IP formulation of the Maxspace problem is as follows: 11 1 1. Constraints (1),1,2,.., (2),1,2,..., 1 (3) 0 1 (4) 0ij iNn iij ji n iij i N ijii jMaxsx st sxSjN xwyin ifAdiisassignedtoadslotj x otherwise ifadiisassigned y otherwise

PAGE 20

10 : total number of advertisements available for scheduling over the planning period total number of available time slots in the planning period Banner height height of advertisemeniWhere n N S s t ,1,2,..., display frequency of advertisements ,1,2,...,iiin wiin Figure 1. A Screen Print of Yahoos Shoppi ng Page. Notice the advertising banner down the right hand side of the Web page.

PAGE 21

11 Constraint (1) ensures that the combined height of the set of ads which are scheduled for each banner slot does not exceed the available space. Another assumption of the model is that if an advertisement is chosen, the number of delivered impressions for that ad must exactly e qual its pre-defined frequency, iw Constraints (2 & 4) combine to ensure that this relationship is gua ranteed. Constraint (3) ensures that at most one copy of each ad can appear in any given sl ot. In other words, it is not acceptable to schedule the same ad multiple times in a given banner slot. This constraint represents a very important aspect of the online advertisement scheduling problem which distinguishes it from other related bin packing and scheduling problems. The Minspace problem is very similar to the Maxspace problem. However, there are a couple of significant diffe rences. An IP formulation of the Minspace problem is as follows: i1 1. Constraints (1),1,2,.., (2),1,2,..., 1 (3) 0 1 (4) 0 Where: S=sizeoftheslots s=sizeofadi, ij in iij i N ijii j i M inS st sxSjN xwyin ifAdiisassignedtoadslotj x otherwise ifadiisassigned y otherwise iN w frequency of ad i

PAGE 22

12 One primary difference is that this probl em does not assume that the size of the banner/time slot is fixed. Instead, the objectiv e of this problem is to schedule all of the ads while minimizing the height of the talles t slot. The authors postulate that this problem may be useful during the Website design phase. For this problem, they developed the Largest Size Least Full (LSLF) algorithm, which is a 2-approximation and they developed a Subset-LSLF algorithm for the Maxspace problem. The LSLF algorithm, which can be implemented in time1(())n i iOwsortA, is a basic greedy heuristic. The steps are detailed below. Largest Size Least Full (LSLF) Algorithm Sort the ads in descending order of size Assign each of the ads in sorted order, ad i is assigned to the iw least full slots. The Subset-LSLF algorithm is very si milar. The steps are as follows. Subset Largest Size Least Full Algorithm Classify the ads into two subsets based on their relative size. If is = S, the ad is placed in subset s B otherwise, it is placed in subset k B Calculate the volume of advertisements for each subset = iisw Choose the subset with the largest vo lume. Assign the ads from this subset as long as there is su fficient space available. For the k B subset, use the LSLF algorithm for placement. The authors show that this is a 2-appr oximation algorithm for the special case where ad widths are divisible and the profit of each ad is proportional to its volume (width times display frequency). One limitation of their work is that nearly all of their meaningful results pertain to this specia l case where the ad sizes are divisible. Dawande, Kumar and Sriskandarajah [5] propose additional heuristic solution techniques for both the Maxspace and th e Minspace problems. For the Minspace problem, they suggest a linear programmi ng relaxation (LPR) based algorithm and a

PAGE 23

13 Largest Frequency Least Full (LFLF) heuristic. The authors prove that the LPR is a 2approximation algorithm and that this bound is as ymptotically tight. In addition, they prove that the integrality gap fo r the algorithm is bounded above by maxs and that the time complexity is 33((1) OnNLNnN where L is the length of the binary encoding of minLP The LFLF heuristic is very similar to the LSLF heuristic designed by Adler et al. [6] except that the ads are assigned in the non-increasing order of their frequency instead of non-increasing order of their size. The time comp lexity of the algorithm is (loglog) OnnnNN which is comparable to that of LSLF; however, the performance bounds are slightly better. The performance bound for the LFLF algorithm for the Minspace problem is: 00 ** 00 01 01 1(solution of LFLF)/(optimum solution val of IP) 1.21/(1)1/2 2.max{1,(2/(2))(()/}(1)/2 min{,...,} min{,...,}n n n i i f frwhere rNwwhenwN rNNSsSwhenwN www sss Ss The bound is tight in both cases. The aut hors also introduce two heuristics, MAX1 and MAX2, for the Maxspace problem. These algorithms involve a decomposition of the set of ads into two subsets based on their freque ncies. Based on the total weight of ads in each subset, the algorithm gives priority to one of the subsets. All of the ads from that subset are assigned using the LSLF heuristic a nd then the other subset of ads is assigned likewise. Max1 has a time complexity of (loglog) OnnnNN and a performance bound */1/41/4 f fN Max2, which is a little more co mplicated, has a time complexity of 3(log) OnNnNN and a performance bound */3/10 ff The authors tested their

PAGE 24

14 heuristics against a test bed of problems. Th ey created 10 sets of problems, with each set containing 10 problems. The number of slots, N, ranged from [25, 100] and the ad sizes ranged from [S/3, 2S/3]. One important contri bution from their work is that they remove the restrictive limitations on advertisement si ze which were present in the work by Adler et al. [6]. The average percentage gap be tween the heuristic and optimum solutions for the LFLF, Max1 and Max2 heuristics we re approximately 30%, 15% and 20% respectively. Freund and Naor [9] propose additional heuristic-based so lution techniques for the Maxspace and the Minspace problems. Followi ng the trend set by Dawande et al. [5], this work also allows arbitrary ad sizes, but maintains all of the other assumptions originally set out by Adler et al. For the Minspace probl em, they propose the Smallest Size Least Full (SSLF) heuristic. Their method is very similar to that of Adler et al.; however, their heuristic considers the ads for placement iteratively in non-decreasing order of size which is the exact opposite of th e procedure proposed by Adler et al. For the Maxspace problem, they propose a (3+E) approximation algorithm which combines a knapsack relaxation and the SSLF heuristic. In addition, they al so provide solution techniques for two special cases; ad widths not exceeding one half of the display area, and each advertisement must either occupy the entire area or else have a width at most one half of the display area. No test results are provided for any of the proposed algorithms. Menon and Amiri [10] propose and test Lagrangean relaxation and column generation solution techniques for a variation of the Maxspace problem. One major difference in their work is that they relax th e advertisement frequency constraint. Instead

PAGE 25

15 of requiring each ad to appear a pre-define d number of times, they set an upper bound for the number of times that each ad can appear. In their explanation, the authors make a concerted effort to distinguish the scheduli ng horizon from the planning horizon. The scheduling horizon corresponds to the length of time within which the publisher commits to deliver a set number of advertisement impressions for their consumers, where as the planning horizon is the period of time for which we are trying to schedule a set of ads to fill the banner space. They claim that the planning horizon should be shorter than the scheduling horizon in order to provide schedul ing flexibility for the publisher. According to the authors, if these horizons are of une qual length as they recommend, the proposed upper bound on the ad frequency should correspo nd with the number of ad impressions left to be delivered for a given advertiser during the scheduling horizon. For example, assume that our planning horizon is one week and that the problem at hand is to develop an advertisement schedule for the third week of September. Let us also assume that we have promised Dell that we will deliver 1000 im pressions of their ad during the month of September and thus far we have only delivered 100. In this situa tion, the upper bound for the number of times that Dells ad could be scheduled during the planning horizon (the third week of September) is 900; howeve r, there is no minimum requirement. The proposed relaxation definitely provides additio nal flexibility and in doing so simplifies the complexity of the problem considerably. However, it also creates another set of potential problems. In the hypothetical example above, with the model formulation provided by the authors, there is no way to guarantee that Dells 1000 impressions would be delivered within the agreed upon time fram e. For obvious reasons, this is probably not a desirable situation. The authors make a ve ry compelling argument for their version of

PAGE 26

16 the Maxspace problem, and it is likely that there are business situations within which their model would be extremely useful. However, the discussed limitation should be carefully considered prior to its application. To test their heuristics, the authors created a large data set which consisted of 1500 problem s. The number of advertisements and the number of time slots ranged from [20, 200], and the size of the banner ranged from [40, 100]. In addition, the authors vari ed the selection process of the iw values, choosing from several different uniform distributions. They applied the column generation and the Lagrangean procedures to the entire data se t. In addition, they combined the column generation procedure with a greedy based prep rocessing heuristic and tested it against the entire data set. Their testing sequence i ndicated that the column generation procedure performs much better than the lagrangean re laxation procedure agains t their data set and that the initialization heuristic only enhan ced column generation procedures dominance. Dawande, Kumar and Sriskandarajah [ 11] propose three improved heuristic solution techniques for the Minspace problem. These solutions represent slightly better performance bounds than those presented in their earlier work. They introduce algorithms Min1 and Min2 which are both slight adaptations of their linear programming relaxation solution (LPR) from their earlier work [12]. Ea ch algorithm involves running the LPR heuristic iteratively with contrasting stopping criteria. Min1 has a time complexity of 33() OnNL and a performance bound of 1(1/2) Min2 offers a slightly better performance bound, 3/2, but pays a cost in the increased time complexity, 43()OnNL. In addition, they offer a heuristic solution for the online version of the Minspace problem. This version requires th at decisions concerning the scheduling of individual ads be made wit hout prior knowledge about the ad s which will arrive in the

PAGE 27

17 future. They recommend the First Come Least Full (FCLF) heuristic which schedules each ad, assuming that there is sufficient space, as it arrives in the least full time slots. This algorithm has a performance bound of 2 -(1/N). The authors do not test their heuristics. Kumar, Jacob and Sriskanadarajah [13] de veloped and tested three new techniques for the standard Maxspace problem. First, they proposed the Largest Size Most Full (LSMF) heuristic, which is based on th e Multifit algorithm th at was developed by Coffman, Garey and Johnson [14] as a solution technique for the classical bin packing problem. This algorithm first finds the maximum slot fullness and then removes ads until a feasible schedule is achieved. The ads are removed based on their relative volume ()iiws in non-decreasing order. The authors point out that as the problem size grows in terms of the number of time slots, N, and the number of advertisements, n, to a size that is comparable to that which is experienced in industry, the basic heuristics that have been proposed are not very efficient. Therefore, they turn to the worl d of meta-heuristics, proposing a Genetic Algorithm (GA) based solu tion technique. GAs are directed global search meta-heuristics which are based on th e process of natural selection. GA based solutions, in many cases, are extremely succes sful when applied to global optimization problems. For a more in depth review, pl ease see chapter 4. Lastly, they propose a hybrid algorithm which combines the GA meta -heuristic with the LSMF and SUBSETLSLF base heuristics. The authors test each of their proposed algorithms and the SUBSET-LSLF algorithm developed by Adler, Gibbons and Matias [6] against two randomly generated data sets. The first data set consists of 40 small problems and the second consists of 150 large problems. The number of time sl ots for the smaller

PAGE 28

18 problems ranges from [5, 10]; for the larger pr oblems the range was from [10, 100]. It should be noted that CPLEX was unable to pr ovide an optimal solution for any of the larger problems in reasonable time. As anticipated, although their time requirements were a little more demanding, the meta-heu ristic and hybrid models performed extremely well, dominating the performance of the heuris tics for both data sets. The hybrid model was the clear winner in terms of solution quality. In its infancy, the industry embraced the CP M pricing model and used it relatively effectively. However, over time many stak eholders recognized one primary difference between online advertising and print advertising which motivated a move to new, more equitable pricing models; a difference in the flow of information. In traditional print media, information primarily flows in only one direction as described pictorially below. Figure 2: Pictorial Representation of Informa tion Flow in Traditional Print Advertising The advertiser provides the publisher with the advertisement and target audience and the publisher provides the advertisements to th e customers. At this point, the flow of information, for all intents and purposes is over. This makes it very difficult to analyze the effectiveness of a particular ad campaign. In an effort to overcome this problem, it is common practice to attempt to correlate periodic revenue/sales trends with adaptations to the marketing strategy. However, due to the plethora of potential causal factors, establishing the true level of dependence of the two movement s is very difficult and often all but impossible. In contra st, the flow of information in online advertising is bidirectional as describe d pictorially below. Advertiser Publisher Customer

PAGE 29

19 Figure 3: Pictorial Representation of th e Information Flow in Online Advertising The advertiser provides the publisher with the advertisement and target audience, the publisher provides the chosen customers w ith the advertisements, and the customers, via their actions, provide the publisher and th e advertiser with im mediate performance feedback. Common consumer activities which ar e of particular interest include clicking on the ad, setting up an account with the advertiser, and/or making a purchase. Unlike the interaction in traditional media advertisi ng, this two-way flow of information makes it extremely easy to measure the effectiveness of an online ad campaign. As a result, performance based pricing schemes such as the CPC, CPS or the CPA have become extremely popular as the industr y searches for a more equita ble risk sharing situation [15]. Several academic researchers have acknow ledged this recent industry trend to incorporate performance measures into the pric ing models. The authors of papers in the second stream of research have adapted thei r problem descriptions to account for this performance based pricing scheme. The next se ries of papers reviewed are all focused on a pure CPC pricing model, and therefore thei r objective functions attempt to maximize the number of clicks and ignore the amount of space used. Langheinrich et al. [16] assumes that ever y customer has recently entered search keyword(s) into a search engine and that the publisher has acce ss to this list of keywords. They propose a simple iterative method to estimate the probability of click through ijc for Advertiser Publisher Customer

PAGE 30

20 each ad/keyword pair based on historical clic k behavior. Given the resulting probability matrix, they use a linear programming approach to solve the following LP. 11 1 1,1,... 1,1,..., 0,1,...,,1,... probability that ad will be displayed for keyword = desired display frequency for ad = totamn ijiij ij m iiji i m ij j ij ij iMaxckd st kdhjm din dinjm di j hi m l number of ads = number of keywords in the current corpus input probability for keyword click-through rate of ad for kw i ijn ki cij The objective of the problem is to ma ximize the likelihood that the delivered advertisement will be clicked. The first constraint is a frequency constraint which ensures that each ad is served the correct number of times. This is the same constraint that is present in the Maxspace problem. The second constraint makes sure that the display probabilities sum to unity for each ke yword. The authors tested their solution model through a series of simulations. Their artificially generated data set had 32 ads and 128 keywords. One potential limitation which is pointed out by the authors is that the model is extremely sensitive to the accuracy of the click through probabilities. This could cause a problem, given the inherent vari ability of these probability estimates. They propose to avoid the unwanted ad domina tion by placing a floor for the display probability of each of the ads. This ensures that each ad has some chance of being

PAGE 31

21 selected. This problem is often referred to as the exploration/e xploitation trade-off in academic literature. The test results s howed that the proposed method improved the cumulative click through rates by approxima tely one percent over the random ad selection procedure. This procedure may work well with smaller problems. However, as the number of keywords grows to a realisti c size, the search space will become very large, and we would anticipate that the pe rformance of the proposed LP based technique would suffer. This model may be a good choice for a publisher who has selected a pure CPC pricing scheme; however, it lacks several constraints which would limit its realworld applicability. The model fails to limit overselling and fails to prohibit the same ad from being displayed simultaneously in the same banner. Tomlin [17] proposes an alternative nonlinear en tropy-based approach to overcoming the exploration/exploitation probl em which was mentioned in the work by Langheinrich et al. [16]. Their model avoids unrealistic solutions which only show ads to a very narrow subset of users; however, its app licability is still somewhat limited in that it ignores prevalent space limitations. Similar to the work by Langheinrich [16], Chickering [18] proposes a system which maximizes the click through rate given only a dvertisement frequency quotas. Instead of using keywords, they partition th e ad slots into predictive se gments or clusters. Each cluster/ad combination has an associated prob ability of click through. They use an LP based approach to solve for a maximum re venue generating ad schedule based on these probabilities and the limiting frequency c onstraints. They also acknowledge the exploration/exploitation problem and attemp t to overcome the issue by clustering the click through probabilities. Their method was tested on the msn. com Web site and it

PAGE 32

22 performed favorably, with respect to time and revenue, against the manual scheduling method that was currently in use. Nakamura and Abe [19] identify severa l weaknesses of the LP based approach which they proposed in the 1999 work which th ey co-authored with Langheirrich et al. [16]. The authors propose potential solutio ns techniques for each of these limitations, including the exploration/exploitation issue, the data sparseness concern, and the multiimpression issue. In an effort to overcom e the exploration/expl oitation issue, they propose substituting the Gittens Index, an index developed by Gittens [20] which maximizes the expected discounted rewards for the multi-armed bandit problem, in place of the estimated click-though ra tes within the objective functi on. They also recommend the use of an interior poin t LP method, and an alternative lower bounding technique for determining the relative display probabilities In an effort to deal with the large cardinality of the search space they propose a clustering algorithm for the attributes, thereby reducing the relative problem size. La stly, they develop a queuing method in an effort to eliminate the possibility of the sa me ad being shown multiple times in the same banner slot. Similar to their previous work the authors tested their proposed techniques via a series of simulations on their artificially ge nerated data set. Recall that this data set is relatively small having only 32 ads and 128 keywords. The new technique performed well in comparison with thei r original LP model and in comparison with a random selection method. The average click-th rough rates were 5.3%, 4.8% and 3.5% respectively. Yager [21] proposes an inte lligent agent system for the delivery of online Web advertisements. The system utilizes a probability-based theme to select the

PAGE 33

23 advertisements to deliver. The publishers are to share demographic da ta relative to their Web customers with the advertis ers. The advertisers, via a fuzzy logic based intelligent agent, use this information to bid on advert ising units with the publisher. The agents iteratively adapt their bids base d on the recurrent information relative to the site visitors. The number of units won by a given publisher determines the probability that their ads will be chosen. The publisher then uses a random number generator and the probability matrix to select which advertisements to serve. Unfortunately, Yagers method was not tested. One potential challenge in applying Ya gers method is the difficulty in collecting the necessary demographic data. Privacy laws make it very hard to collect good demographic data similar to that which is recommended by the authors. Another method to achieve a similar goal which has come unde r a little less scrutiny and which may be a promising way to improve advertisement sel ection is to analyze a customers surfing behavior. As part of this research, we pr opose a framework to analyze the raw html from a customers recent click history using Wo rdNet, a lexical database, and several information retrieval techniques. It is quite evident that the two streams of online advertisemen t research that we have covered thus far are quite distinct, each having its own primary focus. The first stream is focused on addressing the space limitations of banner advertisement scheduling, taking into account the fact th at banner ads are often of different sizes. Given that Web space is at a premium, it is very common for ad prices to vary by size. Therefore, allowing different size ads opens up the market to firms who may not be able to afford the entire banner. While doing so increas es revenue, it also creates an obvious scheduling problem which the authors of the firs t stream have chosen to address. Under a

PAGE 34

24 pure CPM model, which is the focus of this first stream of research, the advertiser absorbs practically all of the risk. The publishe r is paid the same rate regardless of the performance of the ad campaign; therefore, from a revenue maximization point of view, the publisher is just focused on delivering as many ads as possible. This is obviously not an ideal situation for the advertisers. The authors of the papers in the second re search stream instead have chosen to focus on the issue of attempting to creat e a schedule of ads which maximizes a performance based measure and ignores the sp ace constraint. Specifically, these papers focus on the pure CPC pricing model where the publisher is paid a set fee each time a user clicks on an advertisers ad. Under a pure performance based model such as this, the publisher absorbs all of the risk. The advertiser stands to loose very little regardless of the level of effort which they devote to the relationship. Given that the overall advertisement campa ign performance is directly dependent upon the quality of the products provided by both the advertiser (ad, product, offer, etc.) and the publisher (Web site content, incentives, targeting effort, etc.) either of these pure pricing models may lack the correct monetary incentives to maximize the efficiency of the agreement. In an effort to achieve a more equitable risk sharing situation, many companies are adopting a hybrid model which often includes the CPM model and one or more of the performance based pricing schemes. According to industr y experts, this type of model enhances the efficiency of the relationship by improving monetary motivation for both parties. We hope to bridge th ese two streams of re search; introducing methodology which addresses both the Web advertisement space limitations and the performance based pricing models.

PAGE 35

25 Widespread adoption of the performance based pricing models seems to have provided the expected additional motivation. Publishers and advertisers are expending an enormous amount of effort to improve their probability of serving ads to interested consumers, while avoiding inconveniencing thos e who are uninterested This is in the best interest of all of the stakeholders ( publishers, advertisers a nd customers). Common efforts include, but are not limited to geogr aphical targeting, content targeting, time targeting, bandwidth targeting, compleme nt scheduling, competitive scheduling and frequency capping (please see chapter 5 for a more detailed description of these practices). The overall goal is to identify a subset of Internet customers who may be interested in a particular advertisers product and to se rve that advertisers ad accordingly. Given that the average click rate is less than 2%, this is a monumental task; however, as a result of the incredible potential benefits, the devotion of time and effort is well justified. These efforts complicate the task of ad scheduling and therefore need to be considered. In this research, we will extend the current literature by introducing several of these complexities and their resulting fo rmulations while at the same time proposing and testing several artificial intelligence based heuristic and meta-heuristic solution techniques for each model. Current academic research in online adve rtisement scheduling has provided a solid foundation upon which we can build. The models introduced thus far are still commonly used in industry; therefore, this work is very important. However, since, the vast majority of the industry is attempting, with limited success, to tackle a more complicated mix of these factors, there is quite a bit of wo rk left to be accomplished. We see this as a

PAGE 36

26 great opportunity for the academic community and therefore will atte mpt to introduce and provide potential solution t echniques for several more complicated models.

PAGE 37

27 CHAPTER 3 INFORMATION RETRIEVAL METHODOLOGIES This chapter presents an overview of the field of information retrieval (IR). As this field is a very broad and multidisciplinary, we focus primarily on the subsets which are relevant to our research. In section 3.1, we provide a basic in troduction and a general overview of the field of IR research. In s ection 3.2, we briefly discuss several common data pre-processing methods. In section 3.3, we introduce the vector space model. In section 3.4 we cover structural representati on and in section 3.5 we introduce lexical databases with a focused coverage of WordNet. 3.1 Overview Information retrieval (IR) is focused on solving the issues involved with representing, storing, orga nizing, and providing access to information [22]. The underlying goal of IR is to provide a user with information which is re levant to his or her indicated topic of in terest or query. Obviously, this is a very broad a nd daunting task. Through the early 20th century, this area of research was of interest to a very small group of peopl e, primarily librarians and information experts. Their goal was to improve the methods by which a library patron was provided information/books which were re levant to his or her topic of interest. However, as a result of many incredible technological advances, the last few decades have seen the focus and reach of IR broaden substantially. No longer are we limited to the information that is available in our local library. Thanks to advances in electronic

PAGE 38

28 storage and telecommunication infrastructures, we can now access information from all over the world. The Web has become a massive distri buted repository of knowledge and information which seems to grow in popularity and size every day. Today it is just as easy to find information by electronically quer ying a database in Hong Kong as it is to go to a local library to search for the informa tion. In many respects, it is even easier. Similarly, corporate employees often find th at their companys valuable information resources, which are likely to be widely dispersed around the globe and used to be impossible to locate, are now readily accessible via their corporate intranet. This ease of access is wonderfully received by the users as the quantity of information which is readily accessible to each pe rson has grown exponentially; howe ver, in turn so has the challenge of determining which pieces of this information are actually of relevance. Consequently, never has the task of informa tion retrieval been more challenging or more important. IR has become a mainstream research area which is found in almost every discipline and is of great interest to academ icians, individuals and corporations. A much more up to date definition of information retrieval which acknowledges many of these drastic changes is provided by Wikipedia [23, p.1]. There information retrieval is defined as the art and science of searching fo r information in documents, searching for documents themselves, searching for metadata which describes documents, or searching within databases, whether relational sta nd alone databases or hypertext networked databases such as the Internet or intranets, for text, sound, images or data. A basic information retrieval system consists of three main parts; the users query, the ranking

PAGE 39

29 function and the corpus of information. The job of the ranking function is to successfully match each query with the best subset of doc uments from the available set. This is accomplished by ranking the documents based on their respective relevance levels. This is a very difficult task and therefore ha s received considerable attention from the academic community. The most popular ranki ng model is the vector space model which was introduced by Salton in his seminal work [24]. This model is discussed in detail in the next section. Recently, another very prom ising research stream by Fan et al. [25-29] highlights the potential for using gene tic programming to actively discover a good ranking function. This is a technique which is definitely deserving of further review. Other very active areas of IR research incl ude, but are not limited to; query expansion, relevance feedback and lite rature based discovery. Query expansion is a research area whic h is focused on improving our ability to understand and respond to user queries. Query expansion attempts to improve a query by expanding it to include additional terms which are expected to improve the systems ability to respond [30]. This ar ea is particularly relevant as a result of the dramatic popularity of Internet search engines. Relevance feedback is an area of research which attempts to analyze a users relevance judgment with respect to the documen ts that are returned as a result of their queries. Through the analysis, th e hope is that the relevance judgment information can be used to estimate a users profile which w ill help to improve the systems ability to respond to future queries by attempting to infer a users real information need [31] [32]. Literature based discovery uses information retrieval techniques in an attempt to uncover hidden connections between two indir ectly related domains. The seminal work

PAGE 40

30 in the area was conducted by Swanson [33]. In his work, Swanson discovered that fish oil can be used as a successful treatment for Raynauds syndrome. This technique has become very popular and has found a special ni che in the area of biomedical sciences. 3.2 Data Pre-processing Prior to employment of the vector space mode l or similar technique, the raw data is often pre-processed in an effort to eliminate data which is deemed to be of little informational significance. Two ve ry common pre-processing methods are stopping and stemming Stopping is a method by which all of the stop words are removed from each document and query. Stop words are those words such as the, and, a, or, etc., which are deemed to be of very little value with re spect to the content representation of the document. Using these words as index term s would only dilute the pool of keywords. Removal of stop words also has a secondary benefit of compression, which reduces considerably the size of the i ndexing structure. In doing so, computational complexity is often drastically improved. After removi ng the stop words, the remaining words are commonly stemmed. Stemming is a process by which the terms are converted to their base form by removing all of the affixes (suf fixes and prefixes). A computer cannot tell that cooking cooked and pre-cooked are essentially the same word; however, after being stemmed all three would be conve rted into their base word cook This is very important because it enhances the process of determin ing the relative importance of a particular term such as cook If the words were left in their original form, the relative importance of cook which is commonly based on the number of occurrences of the term, would be underestimated. These pre-processing steps often provide significant improvements in the efficiency of the process.

PAGE 41

31 3.3 Vector Space Model A primary goal of information retrieval is to select, from a corpus of documents, the subset which is relevant to a users query. A very powerful IR method for achieving this task is the Vector Space Model [24, 34]. The process begins by transforming each document and query into a series of vectors by counting the frequency of occurr ence of each keyword within each document and query. These vectors are normalized (a number of normalization methods exist) and are then used to determine the relevance of each document. The power of this process is rooted in its ability to tran sform the textual aspects of th e documents and queries into a series of quantitative representations. Th e vector space model made major improvements over its predecessor, the Bool ean Model, by allowing for partial matching. Next, we provide a more formal representation of the vector space model. The vector space model is a theoretically well-grounded model which is easily interpreted based on its geometric properties. Each document and query is represented by a vector of key terms in n dimensional space. A query qj and a document dk would be represented as: 1,2,, 1,2,,(,,...,) (,,...,)jjjnj kkknkqwww dwww where n is the total number of terms in the collection and ijwrepresents the weight which is assigned to the term i for document j. The vector space model evaluates the relative importance of document dk to query qj based on the degree of similarity between the two corresponding n dimensional vectors, jkqandd [35]. In order for this model to be meaningful, the vectors must be normalized. The dot product of the two vectors, which

PAGE 42

32 gives the cosine of the angle between the vectors (see figur e #4), will be 1 if the two are identical and 0 if they are orthogonal. The similarity measure between document dk and query qj is as follows: ,, 1 22 ,, 11(,) ()()n ijik i jk nn ijik iiww simqd ww This similarity score, also called the retrie val status value (RSV) is calculated for each document query combination and is used to rank the documents. A documents RSV score is used as a proxy measure of its rele vance for a given query. The documents are ranked based on their RSV score and served to the user in descending order. Figure 4: Geometric Representation of the VSM One of the most important steps in the vector space model is finding a good set of index term weights, iw The index term weights ar e responsible for providing an accurate estimation of the relative importance of the keywords within the collection. Without a good set of index term weights, the VSM looses its effectiveness very quickly. In their seminal work on this problem, Spar ck Jones [36] introduced the TF-IDF function which is still the most widely used and is considered by many to be the most powerful index weighting function. Although many conten t based features are available within the qj dk

PAGE 43

33 vector space model that may be used to com pute the index term wei ghts, the two that are most common, and the ones that are used in the TF-IDF function are the term frequency (tf) and the inverse document frequency (idf ). The basic TF-IDF function is as follows: *logij ijN wtf df The term frequency, ijtf, is calculated by counting the frequency of occurrence of term i in document j The larger the tf, the more important the term is considered to be in describing the document or query. The inverse document frequency is calculated as ()logijN idf df where N represents the tota l number of documents in the collection and df represents the total number of documents within which term j appears. The basic intuition behind the idf is that a keyword which appears in very few documents is likely to be of greater value in classify ing those documents than would be a keyword which appears in all of the documents. The idf scores are assigned accordingly. The keyword which appears in every document is assigned an idf score of 0 while a keyword appearing in very few documents would receive a much higher idf score. By combining the two, the TF-IDF function gives the greatest weight to terms which occur with high frequency within a very small number of documents. The vector space model has stood the test of time. Although many alternative ranking methods have been proposed since its inception in the la te 60s, the general consensus is that the VSM is as good or bett er than all of its co mpetitors [22]. Although no method has been able to take its place, se veral attempts to improve the basic vector space model are gaining in popularity. Two of these efforts which are of particular interest involve the inclusion of structural and lexical information within the model.

PAGE 44

34 3.4 Structural Representation The traditional VSM considers each document as a simple bag of words leveraging only the resulting textual repres entation. This method has proven to be very useful and effective, but many researchers including Ha lasaz [37] hypothesized that there might be additional information which is overlooked by the basic VSM. This additional information is found in the basic structure of the document. The fundamental idea is that the location in a document where a term appear s may provide additional information as to how valuable that term may be in developing a characteristic representational vector for the document. Consider the basic structure of an HTML document as an example. An HTML document commonly consists of a series of independent sections such as the header, keywords, title, body, anchor, and abstract. From a structural representation point of view, a term which appears in the header might be more important that one which appears in the anchor. Alternatively, a term which appears in the body and in the anchor may be more important than one which just appears in the title. A number of researchers, including Navarro and Yates [38, 39] and Bur kowski [40], have developed alternative models which in corporate document structure into the term relevance calculation. Although many of these methods ar e criticized as being somewhat narrowly focused and lacking in generalizability, th e general consensus acknowledges that this structural representation definitely contai ns important information and should therefore be considered. Accordingly, we incorporate this type of information in our research model.

PAGE 45

35 3.5 WordNet In the previous section, we describe the potential value of including structural information within the VSM model. Lexical in formation may also be very useful. The primary goal of a lexical reference system is to provide its users with word relationships, whereby when a user inputs a word, the system provides a response which summarizes how that word is related to other words. On e system which incorporates lexical analysis is WordNet. WordNet is an online lexical reference system whose design is inspired by current psycholinguistic theories of hu man lexical memory. English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets [41, p.1]. The most basic semantic relation upon which WordNet is built is the synonym [42]. Synsets, or sets of synonyms, form the basic building bloc ks for the system. For example, hat is included in the same synset (also called concepts) with lid and chapeau. Although the synonymy between terms forms the basic structure of Wo rdNet, the system is also organized based on the relationships between differing synsets. These relationships are used to form lexical hierarchies of concepts. For example, consider the terms robin, bird and animal. The important relation that characterizes the relationship between these terms is hyponymy. Hyponymy is the re lation of subordination whic h represents the is-a relationship. Robin is a hyponym (subordinate) of bird, and bird is a hyponym of animal. These types of relationships are hierarchical in nature and therefore are easily represented in the form of an inverted tree. The root node is represented by the most general term, with the terms of each layer of nodes down the tree having a more narrow focus or scope. In our simple example, animal would be the root node, bird would fall in the first layer and robin would be in the second layer of nodes. Hyponym is only one of many lexical

PAGE 46

36 relations which are present in WordNet. Other common relationships which are present include the part-of, has-part, member-of and en tails relationships. Th e current version of WordNet, version 2.1, includes 117,097 nouns 11,488 verbs, 22,141 adjectives, 4,601 adverbs and 117,597 synsets [41]. Lexical sy stems such as WordNet are gaining in popularity and are finding their way into many different fields of research. Lexical reference systems have been used within information retrieval for many different purposes including word sense disa mbiguation [43-45] and semantic tagging [46] [47] ; however, the use that is most pertinent to our research involves text selection. Many researchers including Vorh ees and Hou [48] and Gonzal o et al. [49] have shown that the text selection process can be vas tly improved by utilizing a lexical reference system such as WordNet to enhance the process of developing a vector representation for documents and queries. The basic idea is very similar to that of stemming. In stemming, we convert terms into their root words in an effort to avoid underestimating the importance of a particular term. In contrast, the goal of this research stream is to consolidate terms based on thei r synonymic relationships for the same purpose. With stemming, we make the argument that cooks and cooked should not be treated as separate words and that doing so would underestimat e the relative importance of the term cook. Extending this example, how should the terms cook and prepare be treated? Researchers have shown that, in many cases, combining synonyms such as these enhances the performance of the traditiona l vector space model. This task can be accomplished by expanding the keyword indexing space to include synsets instead of being limited to just the terms. Lexical systems such as WordNe t have proven to be very powerful tools which can often improve our information retrie val research models. This is only a brief

PAGE 47

37 introduction to WordNet. Fellbaum [42] pr ovides a much more thorough review of the system.

PAGE 48

38 CHAPTER 4 LARGE SCALE SEARCH METHODOLOGIES This chapter introduces the field of globa l optimization/global search. Section 4.1 provides a general overview. Sections 4.2 a nd 4.3 provide a detailed analysis of two powerful mteaheuristic techniques; genetic algo rithms and neural networks, with specific emphasis on the variations that will be empl oyed in our research. The final section, section 4.4, introduces and discusses the im plications of the No Free Lunch Theorem. 4.1 Overview Optimization is an extremely active field of research which falls in the interface of applied mathematics, computer science and ope rations research. This area has received considerable attention during the last se veral decades with re searchers devoting a considerable amount of effort to the devel opment of improved solution techniques. This is especially true for combinatorial optimization which is a subset of the set of global optimization problems. Combinatorial pr oblems, which are commonly found in many functional areas including operations management, information technology, telecommunication, etc., pose an enorm ous challenge due to their curse of dimensionality. As a result of their combinatorial nature, as these problems grow in size, their search spaces often become extremely large, discontinuous and complex, and therefore these problems are extremely difficult if not impossible to solve to optimality in polynomial time. The series of online a dvertisement scheduling problems that we address in this work are all combinatorial in nature and have been proven to be NP-hard [50, 51]. As is discussed below, common deterministic solution approaches are often

PAGE 49

39 ineffective in handling these types of problem s; therefore, researchers have developed many other alternative heuristic techniques. Along these lin es, one effort involves the use of metaheuristics. A metaheuristic is an algorithmic framework, approach or method that can be specialized to solve optimizat ion problems [52]. Metaheuristics, which represent an effort by the research commun ity to leverage the enormous amount of computing power that has become availabl e during the recent pa st, are very popular among academics and practitioners and have been used very successfully to attack some of our most difficult optimization problems. We will first provide a brief overview of the wide range of solution techniques which are applied to global optim ization problems and then we will turn our focus to the two speci fic metaheuristic techniques which we utilize in our effort to provide appealing solution alternatives to the proposed online advertisement problems. As mentioned above, solution techniques fo r global optimization problems have received considerable attention during the pa st couple of decades a nd they fall into two main categories; deterministic and stocha stic methods (figure #3). Deterministic methods, which include both calculus based a nd enumerative based techniques, attempt to find the local optima by developing a se quence of iterative approximations which converge to a point which satisfies the necessa ry conditions. Accord ing to Filho et al. [53], calculus based methods can be further s ubdivided into two classes; indirect and direct methods. The indirect methods search for the local optima by solving the set of equations which result from setting the gradie nt of the objective f unction equal to zero, restricting the search space to those points with slopes of zero in all directions.

PAGE 50

40 Figure 5: Classes of Search Method s (Basic Model Borrowed from [54]) The direct methods seek the local optimum by instead hopping along the function via a simple comparison technique formally known as hill climbing. Hill climbing begins with a random starting point and at each step sele cts at least two additional points which are located at predetermined distances from the st arting point. Of the two, the point with the most favorable local gradient is chosen as the starting point for the next step. Two obvious limitations of these methods are that th ey are local in scope and that they only work for smooth well-defined functions. As a result, according to Goldberg [55], they lack the robustness to be very effective ag ainst real world problems which are often combinatorial in nature. Alte rnatively, enumerative methods search every point in the objective functions domain space one at a tim e. Obviously, these methods, unlike the calculus based techniques, overcome the limite d scope issue, guaranteeing that the global optima will be identified. However, becau se of the enormous number of feasible Search Methods Deterministic Guided Search Calculus Based Methods Indirect Enumerative Methods Direct Evolutionary Computing Neural Nets Simulated Annealing Genetic Programming Genetic Algorithms Evolutionary Programming Evolution Strategies

PAGE 51

41 solutions for any large problem, this type of method requires significant solution time and computing power for real-world applicability. As a result of the combinatorial nature of the problems, the time to solve them grows exponentially with the size of the problem. Most combinatorial optimization problems ar e NP-hard [50], and therefore are not likely to have an algorithm which can find the optimal point in polynomial time. Even dynamic programming, which is considered by ma ny to be the most powerful enumerative method, breaks down for all but the smallest of problems [55]. Simply put, these techniques take too much time, lacking the effi ciency to handle real -world problems [55]. In contrast, the guided rando m search methods, which include simulated annealing, evolutionary computing and neural networks attempt to intelligently cover a larger portion of the search space id entifying as many local optima as possible with hopes that one of them will satisfy the global optimi zation conditions. These are guided random search techniques which are all based on enum eration. What separates them from the enumerative techniques discusse d above is that they use supplemental information in an effort to intelligently guide the search process. Simulated Annealing (SA), which was independently invented by; Kirkpatrick, Ge latt and Vecchi in 1983 [56], and by Cerny in 1982 [57], is based on the laws of the therm odynamic annealing process. This technique attempts to deal with highly nonlinear combinatorial optimization problems by mimicking the process by which a metal cools into minimum energy crystalline structure. The search space is traversed probabilistically vi a a series of states which are based on a cooling schedule. Proponents claim that this technique is very prof icient at avoiding the entrapment of local optima. The common them e of nature knows best holds with neural and evolutionary computation (EC) as well, each being a solution technique which is

PAGE 52

42 based on a naturally recurring process. Evolu tionary computation has been used for over three decades and is an attempt to apply Da rwins basic theory of evolution to solve artificial scientific applications. It is a fact that in nature, organisms which are best suited and equipped to compete for the limited pool of resources have a higher probability of survival and are therefore more likely to pr osper through the natural mating process. In doing so, they propagate the process of survival of the fittest by passing on their genetic information via the hereditary process. The work which makes up EC was begun independently by two different researchers. Rechenberg [58] introduced evolution strategies (ES) in an effort to achieve function optimization. Fogel [59] introduced evolutionary programming based on his work with finite state machines. Genetic Algorithms, which differ from evolutionary st rategies and evolutionary programming in their emphasis on the use of specific operators especially crossover which mimics the biological process of genetic transfer, was invented by John Holland and his colleagues at the University of Michigan in 1975 [60]. Genetic Programming is an extension of genetic algorithms within which members of the population are parse trees of computer programs. We will discuss genetic algorithms in much more detail in section 4.2, as it is one of our chosen solution techniques. Neural computation, which re presents yet another attempt by the research community to mimic a naturally occurring process, is believed by many to have been proposed in the 1800s in an effort to determine how the human mind worked; however, formal theoretical analys is is believed to have been started by McCullough and Pitts [20] in the 1940s. Curre nt artificial neural network models offer massively parallel, distributed systems inspir ed by our never ending attempt to model the anatomy and physiology of the cerebral cortex. These systems exhibit a number of useful

PAGE 53

43 properties including learning and adaptation, approximation and pattern recognition, and have been successfully applied to many challe nging application domains. We will also cover neural networks in more detail in section 4.3, as they complement genetic algorithms as one of our chosen solution approaches. The online advertisement scheduling probl ems which we introduce are NP-hard, commonly consisting of a complex search sp ace which is discontinuous and multimodal. As a result, the deterministic methods discu ssed above lack the robus tness to effectively handle all but the most trivial problem instances. In an effort to provide efficient solution alternatives we propose methods based on the theory of genetic al gorithms and neural networks. 4.2 Genetic Algorithms Genetic Algorithms, which were devel oped by John Holland [60] are intelligent probabilistic search algorithms which have b een applied to many different combinatorial optimization problems [61]. As mentioned above, genetic algorithms are based on the process of natural selection. During the cour se of evolution, natural populations evolve based on the principle of survival of the fittest. Organisms which are more prepared and better equipped to compete for the limited pool of resources tend to have a better chance to survive and reproduce while th ose which are inferior tend to die off. As a result, the genes from those highly fit organisms are like ly to propagate in increasing numbers from generation to generation. Consequently, th e combination of favorable characteristics from highly fit ancestors is lik ely to result in the production of individuals which are even more fit than those which preceded them. This evolutionary process often allows species to adapt in a way which makes them more and more capable of dealing with their environments.

PAGE 54

44 Genetic Algorithms attempt to simulate this process. A GA starts with an initial population of individuals (chromosomes), each representing one potential solution to the given problem. Similar to the evolutionary process, new generations of individuals are iteratively created from this initial populati on via the application of three primary genetic operators; reproduction, crossover and mutation. Each individual, which is encoded into a string or chromosome, has a fitness value which is evaluated with respect to the problems objective function. The probability th at an individual will have an opportunity to survive and/or reproduce is based on their relative fitnes s value. Those individuals (normally the highly fit candidates) which are chosen for reproduction generate offspring via the crossover of parts of their genetic material. The children are made up of characteristics which have been inherited fr om both parents. The offspring commonly replace the entire prio r population (generati onal approach); however, in some cases, a portion of the prior population may survive (stea dy-state approach). In addition, a small percentage of the genetic material of th e individuals which make up the new population undergoes mutation. This operator affords the GA the ability to reclaim important genetic material which may have been unfortuna tely lost in an earlier generation. In doing so, mutation effectively allows the GA to move to a different area in the search space [62]. Pseudocode for a basic GA is provided below. begin Generate the initial population; Assign fitness values to the indi viduals in the in itial population; Do until a pre-defined stopping criterion is met: Select the strings that will survive as is; Select strings for the mating pool; Mate the chosen parents via crossover; Apply mutation to the new strings ea rmarked for the next generation; Evaluate the fitness of the new population;

PAGE 55

45 Loop end Although they are based on a semi-random pr ocess, genetic algorithms, because of their ability to exploit histori cal information, garner much higher expectations than that which would be given to a purel y random process. As a result of their relative popularity, many alternative GA permutations have been developed. Thes e include different representation, selection, cr ossover and mutation schemes. We have provided a very basic description of the process; however, a more thorough account can be found in work by Aytug et al. [63], Dumitrescu [64], Gol dberg [55] and Mitchell [65]. Although GAs do not guarantee optimality, because of their perceived relative proficiency in covering a large portion of the search space and their relative ease of implementation, genetic algorithms have been a very popular soluti on technique for combinatorial optimization problems in many different disciplines [61] One such area which deserves specific mention as it is closely related to our work is scheduling. Genetic algorithms have been applied to a plethora of di fferent scheduling problems by many authors including Aytug et al. [66, 67], Chen et al. [68], Biegel a nd Davern [69], Davis [70], Fang et al. [71], Fogel and Fogel [72], Li et al. [73], Liaw [74], and Wang [75]. This is by no means an exhaustive list. For a more thorough account, pl ease refer to works by Aytug et al. [63] and Back et al. [76]. One issue which has remained a significant challenge for GA researchers has been the inclusion of constraints within the GA framework. Without constraints, the comparison of individuals (chromosomes) within a GA, as detailed in the basic description above, is fairly straight forwar d. The relative value of each individual is determined based on its performance with respect to the objective (fitness) function in

PAGE 56

46 comparison with that of the other chromosomes. This is simple and intuitive. Dealing with the infeasibility of a potential solu tion in the constrained case is much more difficult. With the incorpor ation of constraints, we mu st consider not only the performance with respect to the objective f unction, but also their performance against each of the constraints. Given that the cons traints often represent limited resources, this is not an easy task. This problem has rece ived considerable attention and as a result many alternative approaches have been deve loped. These include the use of penalty functions, maintaining feasibilit y, and separating objectives and constraints. Coello [77], Michalewicz [78, 79] and Sarker [80] pr ovide very good surveys and critiques of the many contrasting ideas (in addition, Coello ma intains a dedicated Web site [77] within which a list of related work is publishe d). Unfortunately, although many different techniques have been proposed, none of them has gained a consensus as being the best. Instead, as Coello [77] points out, most of the techniques are very good at handling certain types of problems, but their generalizability is very limited. Given that our online advertisement sche duling problems fall into the constrained optimization category, this lack of consensu s presents a challenge. We employ a method which maintains feasibility. This techni que, which is thought by many including Coello [77] and Liepins et al. [81, 82] to be a very good technique relative to the other alternatives, involves the use of repair operators which maintain feasibility. Instead of searching the entire landscape, this method limits itself to the feasible region. This technique has been employed successfully by Raidl [83], Michalewicz and Nazhiyath [84], Tate and Smith [85], Xiao et al. [86, 87], and many others. One factor which limits the generalizability of this technique is the need for a problem-specific repair operator.

PAGE 57

47 However this is not extremely limiting because, for most problems, identification of sufficient repair operators is not a difficult process. For all of the problems that we propose, greedy-based heuristics are available or will be developed. We will use these heuristics as our repair operators. Another issue that must be considered when employing this t ype of constrained GA solution technique is how to utilize the repaired chromosomes. The basic question is whether or not the repaired individuals should be allowed to reenter the general population. Proposed methods to handle this situation cover the en tire continuum from those such as Liepins [81, 82] who recomme nds that none of the repaired solutions be allowed to enter the general population, to those such as Nakano [88] who argue that every chromosome should be returned. Not surprisingly, many others such as Michalewicz and Dasgupta [89] and Orvush et al. [90] argue that the be st alternative falls somewhere in the middle. We follow the recommendation of Nakano [88], not discarding any of the chromosomes. 4.3 Neural Networks Mankind has long been fascinated and endl essly motivated in our attempts to understand the amazing power of the human br ain. Although our understanding of this process is considered by most to be extremely incomplete, years of research have led to a basic understanding of how the brain trains it self to process information. Neural networks found in the brain consist of billions of specialized cells called neurons. These neurons are organized into a very complicated intercom munication network via axons and dendrites (see figure #4). Each neuron is connected to, and therefore collects information from, tens of thousands of other neurons. Information is shared in the form of varying degrees of electrical impulses. A neuron sends out an el ectrical signal through

PAGE 58

48 a specialized extension called an axon. At th e end of each axon, a sp ecialized structure, called a synapse, converts the signal into a se ries of electrical eff ects which may inhibit or excite activity from the connected neurons. As a result of this activity, the connected neurons will make adjustments to their electrica l activity. Learning occu rs via a series of adaptations of the sensitivity/connection strengt hs of the synapses which in turn changes the level of influence that one neuron may have on another. It is the many different patterns of firing activity wh ich are created by this simu ltaneous cooperation among the extensive network of neurons which provi des the astounding processing power of the brain. Figure 6: Pictorial Representation of the Cerebral Cortex [91] Artificial Neural Networks (ANN) attemp t to mimic and, in doing so, leverage some of the astounding power of this proce ss. Although, as a result of our limited understanding of the process and the limited amount of computing pow er available, our efforts obviously represent a gross simplifica tion of the natural pr ocess, the resulting models have proven to be very powerful. The common sentiment is that McCulloch and Pitts [92] in the 1940s were th e first researchers to attempt to quantitatively model this process. Since then, many researchers have attempted to improve upon their work. As is the case with genetic algorithms, this intense research focus has resulted in many

PAGE 59

49 different variations of ANNs. We will not, in this work, attemp t to introduce all of them. Instead, we will provide a general descripti on of the most basic model and discuss in detail the model variation that we have chosen to employ. ANNs consist of a series of processing units often called nodes which represent the neurons of the artificial syst em. Each node has a series of inputs, an activation function and an output. The nodes are configured in the form of an interconnected network with each connection having an associated weight which determines the relative strength of the connection (see figure #5). These weights determine how influential one node (neuron) will be on another. Through an iterati ve series of steps, inputs are commonly transferred from one layer of nodes to anot her. Having received its inputs, a node preprocesses the information and applies an activation function to the preprocessed data in creating the final output of the node. Many different preprocessing functions have been proposed and tested, such as summation, cumulative summation, and product of the weighted inputs. Likewise, many different activation functions have been proposed and tested. These include, but are not limited to linear functions, step functions, hyperbolic tangent functions, sigmoid functions, and tange nt-sigmoid functions. In addition, many different variations of the arti ficial neural network have b een developed with respect to the network topology, directions of inform ation propagation and weight modification schemes. Given the number of alternatives cataloging the variati ons of ANNs would be a formidable task. For a more thorough review, please see the work by Kartalopoulos [93].

PAGE 60

50 Figure 7: Pictorial Representation of a Basic Feed Forward ANN [91] Since their inception, ANNs have been wi dely applied in many domains such as classification, pattern recogniti on, time series prediction, and optimization. The idea of using neural networks as a solution approach for NP-hard optimization problems [50, 51] originated in 1985 when Hopfie ld and Tank [94] applied a H opfield neural network as a solution technique for the Traveling Salesman Problem (TSP). They used an Energy Function to capture the constraints of the problem. This Energy Function was then minimized using a neural network. Since this seminal work, this area has received increasing attention with many researchers developing new techniques and attacking a number of different NP-hard optimization pr oblems with neural network methodologies. Several researchers have applied neural netw orks to different vari ations of scheduling problems. These include Poliac et al. [95], Sabuncuoglu and Furgun [96], Foo and Takefuji [97], Lo and Bavarian [98], Satake et al. [99], and Lo and Hsu [100]. For a more thorough review, please see the work s by Burke and Ignizio [101], Looi [102], Sabuncuoglu [103], Huang and Zh ang [104], and Smith [105]. We have chosen to employ an Augmente d Neural Network (A ugNN) which is a very promising variation of the neural netw ork architecture proposed by Agarwal et al.

PAGE 61

51 [106, 107]. Common complaints concer ning many other neural network based approaches for combinatorial optimization pr oblems are that they tend to get stuck in local optima and that they are often very inefficient with respect to computational complexity. As a result, their performance often deteriorates exponentially with problem size. These concerns have been especi ally common for Hopfield-based solution techniques. Early applicati ons of the AugNN procedure approach are very promising with respect to these concerns. In the A ugNN approach, the tradi tional neural network model is augmented to allow for the embedding of domain and problem-specific knowledge. AugNN is a metaheuristic appr oach which takes advantage of both the heuristic approach and the iterative local-sear ch approach. AugNN utilizes a proven base heuristic to exploit the problem specific struct ure and then iteratively searches the local neighborhood randomly in an effort to impr ove upon the initial solution. In this approach, the optimization problem is formul ated as a neural ne twork of input, hidden and output nodes. As in a tr aditional neural network, weights are associated with the links between the nodes. Input, output and ac tivation functions are designed which both model the constraints and apply a particular base heuristic. The chosen base heuristic provides the algorithm with a starting solution/ neighborhood in the feasib le search space. After at most n iterations, or an epoch, a feasible solution is generated. A learning strategy is used to modify the relative wei ghts after each epoch in an attempt to produce improved neighborhood solutions in future e pochs. If improvements are found from epoch to epoch, the weights may be reinforced. In addition, if an improved solution is not identified within a predetermined number of epochs, the algorithm has the ability to backtrack. Because of its lear ning characteristics and its abil ity to leverage the relative

PAGE 62

52 problem structure, AugNN tends to find good solutions faster than traditional metaheuristic approaches such as Tabu Search, Si mulated Annealing and Genetic Algorithms. In addition, because it does not involve any re laxations or LP solutions, it is a relatively simple technique in terms of computational complexity and ease of implementation. The procedure was initially applied to the task-s cheduling problem by Agar wal et al. (2003). More recently, AugNN approaches have been successfully applied by Agarwal et al. to the flow shop scheduling problem [108], the ta sk scheduling with nonidentical machines problem [109], the open shop scheduling proble m [110], and to the bin packing problem [111]. One consideration that must be addr essed when developing an AugNN based model is which heuristics to utilize. Original efforts were focused solely on the use of greedy-based heuristics; however Agarwal et al. [107] have recently demonstrated that this may not always be the best strategy. They prove that using a combination of greedy and non-greedy heuristics within the augmen ted neural network formulation provides better solutions than using either alone. We follow this strategy when developing our AugNN models. See the experiment al design for more details. 4.4 The No Free Lunch Theorem Although genetic algorithms and neural ne tworks are extremely popular and have both been applied to a wide variety of optimization problems with very favorable reported results, recently many have begun to question their relative performance comparisons. The No Free Lunch Theorem (NFL) presented by Wo lpert and Macready [112, 113] proves that generalized determ inistic and stochastic algorithms, which are often called black box algorithms, such ge netic algorithms, neural networks and simulated annealing techniques have the same average performance when executed over

PAGE 63

53 the entire set of problems instances. This theorem implies that these techniques are no better with respect to averag e performance than a random search method over the entire set of problems. Opponents of the theorem in itially criticized it claiming that it is very uncommon for one to claim that their technique performs better than some other algorithm for ALL problem instances, but in stead often only claim superiority for a subset of the problem space. In response to this criticism, Schuma ker et al. [114], have proven that the theorem also holds for a subs et of the problem instances. This line of research has initiated and motivated academic discussion on some very difficult questions that needed to be asked; however, it is fortuna tely not as limiting as it may seem at first glance. Whitley and Watson [115], Koehler [ 116], Kimbraugh et al. [117], and Igel and Touissant [118] indicate that there are a number of situations for which the NFL theorem does not apply. Whitley and Watson [115] poi nt out several limitations of the NFL theorem including that it has not been proven to hold for the set of problems in the NP class of complexity. They also indica te that doing so would prove that NP P. Koehler [116] extends this body of research by s howing that the NFL theorem holds for only trivial subclasses of the bina ry knapsack, traveling salesman and partitioning problems. Although these NFL theorems are not quite as widely applicable as once thought, every effort should be made to understand and consider their implications.

PAGE 64

54 CHAPTER 5 RESEARCH MODEL(S) Two of the most pressing problems faci ng companies in the online advertising industry are estimation of user behavior and advertisement scheduling. We will propose and test potential solution techniques for each of these problems. In the previous chapters, we have pr ovided a basic introduction to online advertising, information retrieval, and large scale search, each of which play an integral part in our research model. In this chapter, we will outline the speci fic online advertising problems that are addressed in this research, and we will discuss, in detail, our research plan for each of those problems. In secti on 5.1, we provide a brief summary of the problems at hand. In section 5.2, we discu ss in detail the proposed method for estimating user behavior with respect to online advertisem ents. In section 5.3, we discuss in detail three new variants of the NP-hard online sc heduling problem and our proposed solution techniques. 5.1 Problem Summary As presented in section 2.2, the initial pricing model for online advertising was the CPM (cost per mille) model which was adopted from the traditional print and television media industries. The payment structure of the CPM model is based solely on the number of ad impressions served. The publisher is paid a set fee for each ad impression which is served, regardless of its effectiveness. Under this model, financial considerations motivate the publ isher to focus primarily on only one thing; serving as many ad impressions as possible. Within the print and television mediums, unlike the

PAGE 65

55 online medium, it is very difficult to determine the effectiveness of a particular advertisement. Trends in sales and revenue generation can only be indirectly tied to a particular advertisement campaign. Unless a consumer specifically indicates that their purchase is the result of a particular advert isement to which they were exposed, it is all but impossible to make that connection. However, this is not the case with online advertising. As a result of the two-way flow of information, it is often much easier to measure the effectiveness of an online advertisement. Immediate post-ad exposure behavior by the user is eas ily tracked by the advertiser and the publisher. They can immediately tell if the user clicks on the advertisement, sets up an account with the advertiser, makes a purchase from the advertiser etc. This behavior al visibility has led many advertisers to question the efficacy of the CPM pricing model for the online industry. They believe that, based on the more open, bidirectional, flow of information, it may be in everyones best interest to instea d have pricing tied to one or more of these user behaviors. As a result, several perfor mance based pricing models such as CPC (cost per click), CPS (cost per sale) and CPA (cos t per acquisition) were developed and have become extremely popular. There are ma ny firms still using the pure CPM model; however, a large portion of the industry has moved to models which are either based solely on performance measures or are hybr ids which incorporate both the CPM and performance based criteria. These models ar e generally considered to provide a more equitable risk sharing relati onship. With this industry movement towards performance based pricing models, the task facing many publishers has be come much more difficult. No longer can they simply focus on randomly serving as many ads as possible, but are now faced with two major challenges. In an effort to maximize the utilization of their

PAGE 66

56 most precious resource, advertising space, they must still attempt to serve as many ads as possible, but in addition they must now attempt to do it intelligently. This leaves publishers with two distinct problems. First, they must attempt to estimate user behavior with respect to specific ads, and second, th ey must attempt to schedule the delivery of advertisements accordingly. We propose pot ential solution techniques for each of these problems. 5.2 Information Retrieval Based Ad Targeting Given the current popularity of perfor mance based pricing within the online advertising industry, many publishers find that a large portion of their revenue stream is dictated by the actions of the users. Each time a user shows interest in a particular advertisement by clicking the ad, making a purchas e, etc., the publisher is paid a fee. In an effort to maximize revenue, publishers are eager to increase the probability of these actions taking place. One assumption that unde rlies this portion of our work is that the likelihood of a user taking action with respect to a particular advertisement increases with a higher level of interest for the given good or service which is being advertised. This assumption is very logical and is widely accepted within the industry and the academic research community. Based on this assu mption, the obvious solution would be for publishers to serve users advertisements for products and services for which they have sincere interest; however, this is a difficult task. Unfortunately for publishers, estimating a us ers affinity for certain products and services is a very challengi ng and controversial task. Although users and publishers have common interests in that users would also pref er to be exposed to ads for products and services for which they have an interest than to those that they do not, the real problem comes in getting from point A to point B. How does a publisher gain an understanding of

PAGE 67

57 a users interests? This is a very sensitive subject that must be addressed with extreme caution. Users are very protective of their pr ivacy and the efficiency of their Web surfing experience. Any efforts on the part of a publishe r which violates either is very likely to have a depressing effect on long term corporate revenue. As is indicated in section 2.2, several methods of developing this estimation of a users behavior have been proposed and te sted. These methods include, but are not limited to, analyzing a users search queries [16] [17], analyzing users prior click behavior [18] and analysis of user-specifi c demographic data [21] We recommend and test an alternative approach wh ich is analytically appealing and not overly intrusive. We recommend a method which is based on the de tailed analysis of the raw html which composes a users recent Web surfing history. The basic intuition is that by analyzing a users recent browsing history, we can im prove our understanding of their current interests. To the best of our knowledge, no one else has specifi cally recommended or tested this type of an approach, and theref ore we feel that our unique application of information retrieval and lexical technique s as a method of analysis will contribute positively to the current body of literature. Our go al is to provide those in industry with a viable alternative with which to address this difficult challenge. The basic steps of our process are as follows: 1. Track a users surfing behavior for a predetermined period of time 2. Collect the corresponding html pages 3. Develop a characteristic array for each us er by parsing their respective html pages using IR and lexical-based techniques 4. Develop a characteristic array for each advertisement 5. Using a chosen similarity measure, evaluate each ad/user combination 6. Serve the ads accordingly and measur e the effectiveness of the model

PAGE 68

58 Steps 1 & 2: Tracking a users surfing behavior and collecting HTML pages. We tracked the surfing behavior of 68 st udents for a period of at least 2 hours and captured their respective html files accordingly. 14 of the students failed to follow the instructions in one way or another, leaving us with 54 users for the project. Students were chosen as the users for this project base d solely on their availa bility and willingness to participate. Steps 3&4: Develop a characteristic array for each user and advertisement. Developing a characteristic array for each user is a three step process. First, we parse the set of html pages for a given user into a term vector. Second, we determine the relative importance of each term with respect to developing a characterization of the users interests through their chosen html pages. To accomplish this task we employ several variations of the basic model set forth by C ecchini [119], which inco rporates the use of WordNet concepts. We modify his basic model to also allow for structural analysis as follows: 1) ,, ,()dzizd i z iu dstf df w dlN 2) ,, cuiu icww 3) ,, cuiu icT ww c ,iuw the importance of term i on domain u zsweight factor assigned to structural element z ,, izdtf the frequency of term i in structural element z in document d dld the document length (total num ber of terms) of document d N the number of html documents which are present in a users domain u

PAGE 69

59 idf the total number of documents within which term i appears Function 1, which calculates an estimate of the relative weight of each term i in user domain u, is composed of two distinct parts. The first is a weighted term frequency calculation which is normalized by the docum ent length and which incorporates an zs term into the weight function. The zs term, 01zs represents the relative weight which is assigned to structural element z of the html documents. This term allows us to employ structural analysis which has been recommended by several re searchers including Navarro and Yates [38, 39] and Burkowski [40] Recall that the stru ctural elements of the html documents that we will cons ider in our analysis include the keywords, body and title. Each of these sections is easily identified by its start and stop tags. The basic intuition behind structural anal ysis is that the terms found in one part of the document may hold more information than those which ar e found in other sections. For a particular weighting scheme the assignment of a higher we ight to a particular section follows from an underlying assumption that the associated section will produce concepts which are of higher informational value than the alternativ e sections which have been assigned a lower weight. For example in scheme #2 from tabl e #1, the keywords s ection is assigned the largest weight of .7; therefore, it receives considerably more prominence than the title and body sections. We test several different we ighting schemes within our analysis, in an attempt to identify the best weighting combination. The tested weighted schemes are detailed in table #1 below. Although it was not extremely prevalent, we did find that a small percentage of the html documents did no t have a keywords section. As a result, you will notice in the referenced table, we have provided a contingent weighting distribution for each of the schemes which overrides the original scheme in this situation.

PAGE 70

60 Table 1: Structural El ement Weighting Schemes TitleBody Keywords Scheme 1 0.3 0.2 0.5 Scheme 1 if no KW Sect. 0.7 0.3 0 Scheme 2 0.2 0.1 0.7 Scheme 2 if no KW Sect. 0.7 0.3 0 Scheme 3 0.25 0.5 0.25 Scheme 3 if no KW Sect. 0.3 0.7 0 Unlike the traditional IR task of separa ting documents based on their individual representations, we are instead attempting to develop one representati on for the entire set of documents for a given user u. Given this objective, a term which appears in many of the documents is anticipated to have great er informative power than one which only appears in a small number of documents. Th is is the motivation behind the second term of function 1, idf N In function 2, we generalize our term representation scheme by introducing the notion of concepts, c. Each concept, c, represents a synset and is composed of the 12{,,...,}niii terms that make up that syns et. Recall from our discussion of WordNet that a synset is composed of the chosen term and all of its synonyms. Considering concepts allows us to avoid over or under estimating the importance of a particular term by aligning it with its synonym s. Function 2 provides a concept weight by summing up the weights for all terms i in the synset c. In some cases, the same term may appear in more than one concept. We adopt a method introduced by Sacaleanu and Buitelaar [120] in function 3 to facilitate the assign ment of the term to one of the concepts in this situation. Function 3 includes an additional term c T where T is the total number of terms within a concept c and c is the cardinality of the concept. The term is

PAGE 71

61 assigned to the concept with the highest score from function 3. This functional analysis will result in the interests of each user u being represented as: ,1,2,,(,,...,)cuuunuUwww where n is the total number of concepts in the domain of user u and cuwrepresents the weight which is assigned to concept c for user u. Please see appendix C1-C5 for sample input, word and weighted concept fi les for a user in our study. The last step in this stage of the process is to develop a similar vector representation for each advertisement. This was completed semi-manually. First, we manually assign descriptive terms to each adve rtisement. Please see appendix B for a list of the ads and their respective characteristic arrays. In a real world application, we recommend that this task be completed by a ma rketing expert for each product or service. Next, WordNet was used to develop a concept representation for each of the advertisements. Finally, we assigned a rela tive importance weight to each concept for a given advertisement. This weight will represent the relative importance of that concept in describing the given advertised product or service. This process will result in each advertisement being represented as: ,1,2,,(,,...,)cjjjnjAwww where n is the total number of concepts in the domain of advertisement j and cjwrepresents the weight which is assigned to concept c for advertisement j. Although manual development of vector representations is not uncommon, and in some cases offers improved accuracy, it is probably not the most efficient [22]. It works well for our

PAGE 72

62 project, but it may not be a f easible alternative in a large scale operation; therefore, one extension to our work may be to attempt to automate this process for advertisements. Step 5: Using a chosen similarity measu re, evaluate each ad/user combination. The goal of this model is to ra te advertisements on their likel ihood of being of interest to a particular user. We estimate this series of likelihoods based on the similarities of the respective user and advertisement vector repr esentations via the vector space model [24, 34]. Recall from section 3.2 th at the vector space model esti mates the similarity between two n dimensional normalized vector s based on the size of the angle which separates them in n dimensional space. The measure is calculated by taki ng the dot product of the two vectors. In order for us to apply a similar technique, we first need to adapt our advertisement vectors ,cjA to include a term for each concept which is present in the users domain space u. This is accomplished as follows: ,if concept is present in user 's domain space for c = 1,...,n 0 otherwise cj cjwcju A where n is the number of concepts in the domain space u of the user. Given the two n dimensional vectors ,, cuckUandA we calculate their similarity as follows: ,, 1 ,, 22 ,, 11(,) ()()n cuck c cuck nn cuck ccww simUA ww This similarity score, also called the retrie val status value (RSV) is calculated for each ad/user combination and is used to rank th e advertisements. An advertisements RSV score is used as a proxy measure of its rele vance for a particular user, the higher the score, the greater is the presumed relevance.

PAGE 73

63 Step 6: Serve the ads accordingly and measure the effectiveness of the model. The last phase of this part of our project is to evaluate the e ffectiveness of the model. We created a corpus of advertisem ents consisting of 100 arbitraril y chosen ads (for a list of the products and services which are represente d by this corpus of advertisements please see appendix B). From this corpus, each user was provided with a set of advertisements and asked to rate, on a scale of 1 to 5, their level of intere st in the respective product or service, using the following scale: Product/Service Ranking Scale 1 No Interest 2 Little interest 3 Moderate Interest 4 High Interest 5 Very High Interest One subset of the advertisements which were served to a given user were chosen randomly (20 ads), while the remaining ads were selected based on the similarity ranking functions described above (the top 20% of ads for each weighting scheme were selected). As expected, there was quite a bit of overl ap in the ads which were selected by the different weighting schemes. Any ad whic h appeared in more than one category was only served once. Hypothesis 1: The IR based ad selection met hod will be more effective than the commonly used random model in sel ecting targeted ads from a given advertisement corpus with respect to the level of interest to a given user. As previously mentioned, in an effort to develop a good set of initial structural weighting schemes, we manually analyzed the raw c ode from a sample of the users html documents. A secondary result of this analysis is that we also de veloped a preliminary

PAGE 74

64 opinion as to the relative importance of the different structural se ctions of the html documents. We attempt to test this intuition in hypothesis #2. Hypothesis 2: Within our model, the keyword section of the html documents provides more information than the other structural sections. Results of the experiments and tests of the hypotheses will be presented in detail in Chapter 6. For reference purposes we have 5.3 Online Advertisement Scheduling The second major challenge that faces publishers is the development of an advertisement schedule. This is a difficult task that must be repeatedly performed by each publisher. The most precious res ource that a publisher has is their online advertising space; therefore, they must make every attempt to use it as efficiently as possible. Developing a good schedule is wide ly accepted as the most important task in achieving the publishers goal to maximize re venue. Although onlin e advertisements may take several different forms including rich media and popups, in this work we have chosen to focus on the most common form, banner advertising. Banner advertising has long been the staple of the onlin e advertising industry and it is still vitally important. In an attempt to make the best use of their ad vertising space, publishers proactively develop ad schedules for their predefined planning hor izon. This problem takes the form of a constrained optimization problem. Although ev ery publisher is faced with a similar problem, the relative model complexity can vary substantially depending on which pricing model is chosen, and whether or not any other efforts are employed. We propose three extensions of the basic Maxspace m odel. The new models extend the basic Maxspace model to include efforts that are very common in industry. Solution techniques for solving each of these extensions are proposed and tested.

PAGE 75

65 5.3.1 The Modified Maxspace Problem (MMS) The most basic situation w ith respect to online advert isement scheduling involves a pure CPM pricing model and no a dditional incorporati on of intelligence by the publisher. This problem was introduced into academ ic literature in 2002 by Adler, Gibbons and Matias in their seminal online advertisi ng paper [6]. They named it the Maxspace problem because the primary goal of the publis her is to schedule ads in a manner which uses the maximum amount of available advertising space. The first model that we propose is a slight variation of this basic Maxspace problem. Unlike the basic Maxspace problem which has a hard frequency constraint requiring that an ad be served exactly iw times within the planning horizon if it is se lected, this problem instead allows for an acceptable frequency range for each advertisemen t which is very common in practice. The frequency bounds provide needed fl exibility to the publishers. An IP formulation of this problem is as follows: 11 1 1 1. Constraints (1),1,2,.., (2)(1)(1),1,2,... (3),1,2,... 1 (4) 0 1 (5)ii ii ij iNn iij ji n iij i N iiji j N ij jMaxsx st sxSjN L MyxUMyin MyxMyin ifAdiisassignedtoadslotj x otherwise ifadiisassi y 0 gned otherwise

PAGE 76

66 Where: number of advertisements number of Banner/Time Slots Banner height height of advertisement ,1,2,..., lower limit on the frequency of ad,1,2,..., upper limii i in N S siin Liin U t on the frequency of ad ,1,2,..., large penalty greater than the number of Banner/Time slots iin M The first constraint ensures that the combined height of the set of ads which are scheduled for each banner slot does not exceed the available space. An assumption of the model is that if an advertisement is chosen, the number of delivered impressions for that ad must fall between a predefined uppe r and lower bound. Constraints #2 and #3 combine to ensure that this relationship is gua ranteed. They assure that if an ad is not served it will be bounded by 0 in both directi ons, and if an ad is served, its frequency must be between the lower and upper bounds. If an ad is served, constraint #2 dominates constraint #3 and the frequency is thus c onstrained by the lower and upper bounds. If an ad is not served, constraint #3 dominates cons traint #2, which forces the frequency to 0. Thus, these constraints prevent any number of impressions that is not either 0 or between the bounds. We acknowledge that this repres ents a slight variation from the model solved in Adler et al. [6], Kumar et al. [5, 13], Freund et al. [9]. In the model presented in those papers, constraints #2 and #3 are presented as 1,1,2,...,N ijii j x wyin, which assures that the ad is served exactly the prescribed number of times over the planning horizon. Another slight vari ant is proposed by Amiri and Me non [10]. In their proposed formulation, constraint #2 a nd #3 are replaced by ,1,2,...iiiiwyUyin providing an

PAGE 77

67 upper bound on the number of times the ad is served. Within the industry it is very common for an advertisement to have an upper and a lower bound on the frequency; therefore, we have adapted our formulation accordingly. We could have alternatively used a fixed charge approach to link iji x andy of the form: 1 1,1,2,..., ,1,2,...,N iiijii j N iji jLyxUyin xMyin However, this would necessitate including a penalty on i y in the objective function to assure that 0iy in the case where 0ijx and running a risk that the chosen artificial penalty could affect the optimal outcome. Finally, constraint #4 assures that at most one copy of each ad can appear in any given slot. By definition of the decision variables, the same ad cannot be displayed multiple times in a given banner slot. This model is based on the assumption that the revenue generated in creases linearly in the volume of the ad. This assumption will be relaxed in our last model. Many publishers still follow a variant of this basic model, making the Maxspace problem very popular in the res earch literature. Adler et al. [6] prove that the ad scheduling problem is NP-hard. Therefore, it is highly unlikely that it can be solved by an efficient optimization algorithm [50]. As noted above, many researchers, including Freund and Naor [9], Amiri and Menon [10], and Kumar et al. [5, 13], have proposed approximation solution techniques for the Maxspace problem, although none have solved the variation presented above. We will exte nd this line of research by proposing and testing several heuristic and metaheuristic approaches for the proposed variationu of the Maxspace problem. In addition, we also pr opose two additional models which extend

PAGE 78

68 this basic model, but are quite a bit mo re demanding in terms of computational complexity. 5.3.2 The Modified Maxspace Problem with Ad Targeting (MMSwAT) Given the industry migration towards pe rformance based pricing models, ad targeting has become a focal point of discussion, and is considered by many to be the most important effort in online advertising [121]. Ad targeting is an industry term used to describe any effort to deliver an advertisement to a subset of the advertisement time slots based on an estimation of the likelihood th at the user who is vi ewing those ad slots will act on it. In targeting, it would be ideal for a publisher to have an accurate probability matrix indicating the probability fo r each of their time slots that a given user would perform the action of interest (cli cking, for example) for each of their advertisements. This would allow the publisher to be very precise in their targeting of each individual ad; however, as a result of privacy laws and the large number of time slots, advertisements, and users, developm ent and maintenance of such a matrix is considered, in most cases, to be impractical if not impossible. Instead, advertisements are commonly targeted to a subset or cluster of the publishers time slot population. These clusters are chosen in an effort to maximize inclusion of the advertis ers target audience. For example, one very popular method is to cluster based on geographic regions. A companys target audience is often geographica lly concentrated in certain areas. When this is the case, it is logical to target th at companys ads to users/time slots which are located in those local regions. This tec hnique has recently grown substantially in popularity. The current projec ted year-over-year revenue gr owth rate for local online advertising is approximately 50%, which is mo re than twice the projected growth rate for online advertisement spending as a whole [ 122]. Other popular methods of time slot

PAGE 79

69 clustering include clustering based on a Web site s page content, the time of day, the day of week, a users bandwidth, Nielsens DMA regions, etc. Publis hers often use these methods simultaneously in an effort to improve performance. Earlier in this work, we discussed our proposed efforts to provid e another good alternative clustering method based on the application of IR techniques (see section 5.2). As the first of our two extensions to our modified Maxspace model, we model the scenario where a publisher has chosen a hybrid pricing model and is therefore employing some type of advertisement targeting. Incor poration of the hybrid pricing model is very important because it is very popular in industry; however, it has thus far received very little attention in the academic research lit erature. We do not distinguish between the different methods of targeting; instead our model is focused only on step two of this process, assuming that the targeting method(s) has been previously chosen and that the users/time slots have been clustered accord ingly. We acknowledge that the proposed model may require slight adjustments depe nding on which of the targeting methods are chosen in step 1. However, we have attemp ted to make the model as general as possible in an effort to increase its scope of appli cability. We now provide a basic description of the model extension. The major difference in this model is the incorporation of clus ters of time slots which would be based on the chosen targe ting effort(s) from the first stage of the problem. As an example, if content targe ting was the only chosen targeting method, the clusters would be formed based on the content of the Web page (i.e. one cluster for time slots on the sports page, one cluster for slot s on the music page, one cluster for slots on the news page, etc). These designations w ould obviously be different for other methods

PAGE 80

70 of targeting. Given that one cluster is the full ad set which includes all of the time slots, each time slot must be assigned to at least one cluster; however, each slot could also be assigned to other clusters. Li kewise, each advertisement must be targeted to at least one cluster, but may be targeted to multiple clusters. A two dimensional input matrix ijC is used to manage the cluster assignments for each ad/time slot combination. The input elements of ijC equal 1 if ad i and time slot j have at least one cl uster in common. For example, assume that time slot j represents a banner on the sp orts page and therefore has been designated as being part of the spor ts cluster. Likewise, assume that ad i is an advertisement for tennis rackets and that cons equently the advertis er and publisher have decided to target it to time slots in the s ports cluster. In this case, since ad i and time slot j have the sports cluster in common, the ijC entry for this ad/time slot combination would equal 1. Had this ad not been targeted to the sports cluster, this ijC matrix entry would have been 0 unless they had another cl uster in common (which would be the case if this slot was, for example, also assigne d to the outdoors clus ter and the decision was made to target tennis racket ads to the outdoors cluster). The IP formulation of the problem is as follows:

PAGE 81

71 11 1 1 1. Constraints (1),1,2,.., (2)(1)(1),1,2,... (3),1,2,... (4),, 1 (5) 0 (6)ii ii ijNn iij ji n iij i N iiji j N ij j ijijMaxsx st sxSjN LMyxUMyin MyxMyin xCij ifAdiisassignedtoadslotj x otherwise 1 0 Where: number of advertisements number of Banner/Time Slots Banner height height of advertisement ,1,2,..., lower limit on the frequency ofii iifadiisassigned y otherwise n N S siin L ad,1,2,..., upper limit on the frequency of ad ,1,2, ..., large penalty greater than the number of Banner/Time slots equals 1 if ad and time slot have at least one clusi ijiin Uiin M Cij ter in common and 0 otherwise, ij The first constraint ensures that the cumula tive area consumed by the ads assigned to any given slot is within the assigned space limitation S of that slot. The second and third constraints ensure that, if an ad is chosen, the number of delivered impressions of that ad falls within the contracted upper and lower li mits. The new constraint, constraint #4, ensures that each ad, if served, is only served to time slots which belong to clusters to

PAGE 82

72 which the ad has been targeted. Constraints #5 assures that at most one copy of each ad can appear in any given slot. 5.3.3 The Modified Maxspace Problem with Non-Linear Pricing (MMSwNLP) For our last extension, we extend the Modi fied Maxspace Problem to also include a non-linear pricing function. Prior modeling efforts have assumed that ad vertisers are charged a constant rate per unit volume of their advertising regardless of th eir level of commitment. This is easily represented in the model formulation; however, in many cases, it does not accurately reflect the pricing behavior seen in industry. In an effort to entice advertisers to commit to a larger volume of advertising, instead of using a constant prici ng structure, publishers often offer a series of price breaks. From a publishers standpoint, the obvious goal is to increase overall revenue by improving the effici ency of ad space utilization. These price breaks are commonly implemented via a step pricing function, with the overall continuum of volume commitments being subdivi ded into a series of ranges, each range having its own price per unit volume. The size of the ranges and the relative prices per unit volume will obviously differ from publisher to publisher. We extend our previous model to allow fo r these additional complexities and to provide alternative solution methods for th e resulting problem. We now provide the associated IP formulation for this problem.

PAGE 83

73 11 1 1 1() Constraints (1),1,2,.., (2)(1)(1),1,2,... (3),1,2,... 1 (4) 0 1 (5)ii ii ij iNn iiij ji n iij i N iiji j N ij jMaxfsx st sxSjN LMyxUMyin MyxMyin ifAdiisassignedtoadslotj x otherwise ifadiis y 0 Where: number of advertisements number of Banner/Time Slots Banner height height of advertisement ,1,2,..., lower limit on the frequency of ad,1,2,i iassigned otherwise n N S siin Lii ..., upper limit on the frequency of ad ,1,2, ..., large penalty greater than the number of Banner/Time slots ()the non-linear step function of price per unit volume, i in Uiin M f i The first constraint ensures that the cumula tive area consumed by the ads assigned to any given slot is within the assigned space limitation S of that slot. The second and third constraints ensure that, if an ad is chosen, the number of delivered impressions of that ad falls within the contracted upper and lower lim its. Constraints #4 assures that at most one copy of each ad can appear in any given slot. One subtlety of the last tw o models bears explanation. We made the claim that the last two models were designed to help publishers improve their performance under a hybrid pricing model; however, neither of the models specifically optimizes over any of

PAGE 84

74 the mentioned performance based measures. The explanation lies in the fact that although the formulation indicates that we are only optimizing over the amount of space which is used, similar to the objective functi on of the base Maxspace problem, efforts by the publishers to improve their performan ce with respect to the performance based measures is implicitly included by the fi rst stage targeting and nonlinear pricing strategies. This is the common practice in industry. 5.4 Model Solution Approaches The three online ad scheduli ng problems which have been introduced are NP-hard combinatorial optimization problems that ar e addressed on a daily basis by many online advertising publishers. Their NP-hard nature makes it highly unlikely that they will ever be solved by an efficient optimal algor ithm. Therefore, efficient and effective approximation algorithms are necessary. In an effort to further the search for such algorithms, we propose and test several heuris tic and metaheuristic approaches for each problem. Our metaheuristic approaches w ill be based on very popular genetic algorithm and neural network methodology. 5.4.1 Augmented Neural Network (AugNN) We have chosen a very promising varia tion of the traditional neural network architecture, the Augmented Neural Networ k (AugNN), which was first introduced by Agarwal [106] in 2003. The AugNN approach augments the traditi onal neural network model to allow for the embedding of domain and problem-specific knowledge via a base heuristic. This approach takes advantage of both the heuristic appro ach and the iterative local-search approach. AugNN utilizes a prov en base heuristic to exploit the problem specific structure and then iteratively searches the lo cal neighborhood randomly in an

PAGE 85

75 effort to improve upon the in itial solution. The chosen base heuristic provides the algorithm with a starting solution in a neighborhood in the search space. AugNN Notation RF : Reinforcement factor BF : Backtracking factor : Learning rate coefficient k : Epoch number i : Weight associated with ad i i A : Error or difference between the current ad schedule value and the upper bound in epoch k (upper bound = total quantity of available ad space) SF : Stop Factor MF : Learning rate multiplicative factor NBA : Number of backtracks allowed After at most n iterations, or an epoch k a feasible solution is generated. A series of relative weights, one of which is associated with each advertisement i are modified after each epoch in an attempt to produce improved neighborhood solutions in future epochs. If improvements are found from epoch to epoch, the weights may be reinforced. In addition, if an improved solution is not identified within a pr edetermined number of epochs, the algorithm has the ability to backtr ack. As a result of its successful use of domain specific knowledge, this technique seems to avoid being trapped by extremely inefficient local optima, which has plagued ma ny other neural network based techniques. In addition, because it does not involve any relaxations or LP solutions upon which many other alternative techniques ar e dependent, it is a relatively simple technique in terms of computational complexity. In order to a pply this technique, we need a good base heuristic for each of the three problems. After testing several alternatives, we are

PAGE 86

76 employing a largest volume least full (LVLF) heuristic for the Modified Maxspace and the Modified Maxspace with Non-Linear Pr icing problems and a Subset-LVLF heuristic for the Modified Maxspace with Ad Targeti ng problem. Both heuristics are described below. The LVLF heuristic is very simila r to the LSLF heuristic which was introduced by Adler, et al. [6] with the only differen ce that the ads are sorted based on volume instead of size. We test the heuristics bot h independently and in combination with the AugNN procedure for each problem. Largest Volume Least Full (LVLF) Algorithm Sort the ads in descending order of volume utilizing the upper frequency bound for the volume calculation of each ad. Assign each of the ads in sorted order. If feasible, assign ad i to the least full slots one at a time until either we reach a time slot which has insufficient capacity to accept ad i or the upper frequency bound for ad i iU is reached. Subset Largest Size Least Full Algorithm Classify the ads into two subsets based on th eir target id. Some of the ads will be targeted to a specific set of time slots while others are untargeted and can be served to any available slot. If the ad is targeted, it is placed in subset tD otherwise, it is placed in subset ntD Sort the ads in descending order of volume utilizing the upper frequency bound for the calculation. Utilizing the LVLF algorithm, assign the ads from subset tD and then from subset ntD as long as there is sufficient space available. The method by which we modify the a ugmented neural network weights is determined by the learning strategy. The l earning strategy consists of the weight modification formula and any additional me thods which are chosen. We employ the following learning strategy: a. Weight modification formula i(k+1)= i(k) + si (k), i A

PAGE 87

77 b. Additional methods In addition, we employ reinforcemen t and backtracking mechanisms, as detailed in the algorithm below, to improve the solution quality. Our strategy is predicated on the theory that if the error in an epoc h is too high, the order in which the ads are selected for assign ment during the following epoch should be changed more than if the error was less. AugNN Algorithm Step 1: Initialize RF BF SF NBA, I, and k Step 2: Run LVLF or Subset-LVLF (for the Modified Maxspace with Ad Targeting problem) Step 3: Set t = 1, x = 1, z = 0, and y = 1 Step 4: Evaluate the fitness of the initia l solution based on the objective function of the respective problem. Step 5: Modify the AugNN weights via the weight modification form ula discussed in a above. Step 6: Run LVLF or Subset-LVLF (for the Modified Maxspace with Ad Targeting problem) and set x = x+1. Step7: Evaluate the fitness of the new ad schedule and check its uniqueness. If it is unique, set t = t+1. Step 8: If t (number or desired unique solutions) or x (SF number of desired unique solutions), terminate and report the best solution so far. Step 9: If the fitness of the current ad sc hedule > the best solution so far, reinforce the current set of AugNN weights by replica ting the last set of we ight modifications RF. Set y = 1 and return to step 6. Step 10: If y < BF, modify the AugNN we ights via the weight modification formula discussed in a above, set y = y+1 and return to step 6. Step 11: If, y BF and z < NBA set y = 1, modify the AugNN weights by resetting them to the best set of weights thus far, set z = z+1 and return to step 6. Step 12: If, y BF and z NBA set z = 0, set = x MF modify the AugNN

PAGE 88

78 weights by resetting them to the best set of weights thus far, set z = z+1 and return to step 6. 5.4.2 Genetic Algorithm (GA) We also employ a genetic algorithm (GA) based algorithm. For the three proposed problems, each GA chromosome, which can be vi sualized as a 1 x n vector as depicted below, represents a candidate sequence of n advertisements 12{,,...,}nAaaa a2 a4 a1 a5 a3 The advertisements are served in the orde r in which they appear in the respective chromosome. For example, given the basic fi ve ad chromosome string depicted above, the GA would first attempt to serve ad 2, then ad 4, then ad 1, etc. When attempting to serve a given ad, if there are not at least Li (lower frequency bound for ad i ) time slots with sufficient capacity to accommodate the ad, it is not served at all. Those ads which do meet this feasibility requirement are se rved until either their upper frequency bound is reached or an attempt has been made to plac e the respective ad in each time slot. The associated fitness value is measured based on the objective function of the given problem. The three primary operations of a simple genetic algorithm are reproduction, mutation and crossover. We employ a roulette wh eel reproduction method, a one-point crossover and a basic ad swap mutation operator. GA Notation e : elite percentage m p : probability of mutation p s : population size NU : number of desired unique solutions

PAGE 89

79 CL : crossover attempt limit The GA begins with an initial population of strings which are all created randomly with the exception of one string which is created using the LVLF or Subset-LVLF heuristic (depending on which problem is being solved). Between generations we use the elite percentage (e) to determine how many of the most fit strings will survive unchanged into the next generation. The r oulette wheel reproduc tion operator selects potential reproductive pa rental strings based on their rela tive fitness values. Each string has a probability of selection wh ich is directly proportional to the ratio of its fitness value divided by the sum of the fitness values of the entire population. The most fit strings are thereby given the highest probability of selection. Given the binary nature of ad selection in all three of the proposed ad scheduling pr oblems, any ad duplica tion within a proposed solution string causes it to be infeasible. As a result, common GA selection and crossover mechanisms struggle to achieve an acceptable level of feasibility for these problems. To overcome this challenge, we use a crossover mechanism developed by Kumar et al. [67] which insu res the feasibility of each new offspring. Having selected two parent strings via the roul ette wheel process described a bove, a single crossover point is randomly selected. In the example depi cted in Figure #8, point number five, which falls between ads five and six, was selected. Based on the chosen crossover point and the genetic material of the parents, two children strings are created. Th e genetic material on the left side of the crossover point in parent 1 is then di rectly inherited by child 1 and similarly for parent 2 and child 2. In our example (see Figure #9), the first set of ads which are inherited by child one are ads a7, a4, a11, a5 and a8. Up to this point, the proposed crossover method has followed the basic single point crossover process;

PAGE 90

80 however, the remainder of the process is somewhat different. Unlike the traditional mechanism, the second half of the genetic material which makes up the chromosome string of child one is not dire ctly inherited from parent tw o. Instead, the ads which make up the second half of child ones string are in herited from the second half of parent one with the caveat being that they are re-ordered based on how they app ear in parent two. A similar process is followed for child two. In our basic example, the ads which make up the second half of child one are ads a9, a10, a3, a6, a1 and a2, but they are reordered based on how they appear in parent two (ie. a2, a1, a10, a9, a3 and a6). This reproduction process has created two new offspr ing for the next generation. However, before being added into the next populati on, the new offspring are given an opportunity to mutate based on the pre-defined probability of mutation operator (pm). A string which is selected for mutation will have two random ly selected ads swap places within the string. In the example below (see Figures #10 and #11), it is assumed that the second child has been selected for mutation and ads a8 and a11 have been randomly selected as mutation candidates. This entire process in repeated from generation to generation until a predefined number of unique solutions have been created or the crossover attempt limit has been exceeded. Parent 1a7a4a11a5a8a9a10a3a6a1a2Parent 2a2a8a1a5a10a3a9a11a6a4a7 Crossover Point Figure 8: Selected Parents Prior to Crossover

PAGE 91

81 Child 1a7a4a11a5a8a2a1a10a9a3a6Child 2a2a8a1a5a10a7a4a11a9a3a6 Figure 9: Resulting Offspring a2a11a1a5a10a7a4a8a9a3a6 Randomly Selected Ads Figure 10: Child 2 Prior to Mutation a2a11a1a5a10a7a4a8a9a3a6 Figure 11: Child 2 After Mutation GA Algorithm Step 1: Initialize Step 2: Apply LVLF or Subset-LVLF (for the Modified Maxspace with Ad Targeting problem) and insert the resulting solution as the first string in the initial GA population Step 3: Fill the initi al population by creating (p s 1) random chromosomes Step 4: Set t = 1 and c = 0. Step 5: For each string, attempt to assign each of the ads in the order in which it appears in the string. If feasible, assign ad i to the least full slots one at a time until we either reach a time slot wh ich has insufficient capacity to accept ad i or the upper frequency bound for ad i iU is reached. Evaluate the fitness of each string based on the objective function of the respective problem. Check each chromosome for uniqueness. For each unique string, set t = t+1 Step 6: Sort the strings in descendi ng order of their relative fitness values Step 7: Populate the elite list by selecting the best (e* ps ) strings based on their relative fitness values. These strings are added to the next population. Step 8: Utilizing the roulette wheel sele ction method, select two parent strings for

PAGE 92

82 reproduction and cross them over. Set c = c+1. Step 9: Mutate the resulting ch ildren based on the mutation probability. Step 10: Add the children strings to the next population ad test them for uniqueness. For each child that is unique, set t = t+1. Step 11: If t NU or c CL, calculate the fitness value for those strings in the new population and terminate report ing the best solution so far. Step 12: If the number of chromosomes in the next population ps goto step 5; otherwise, goto step step 8. 5.4.3 Hybrid Technique Both the AugNN and GA methods are expe cted to perform reasonably well on the three proposed problems; however, in some cas es plural methods can be employed which leverage the best aspects of multiple techniques. Based on this intuition, our final solution approach for the three proposed pr oblems will be a new hybrid technique which combines the AugNN and GA methods. Th e hybrid method employs the AugNN method in an effort to search the best local ne ighborhood which has been di scovered after each generation of the genetic algorith m. If the AugNN is able is able to find an improved solution, the respective solution is then fed into the next population of the GA; otherwise, the GA proceeds normally. This process repeats until the desired number of unique solutions have been found. Hybrid Notation NUA : number of desired unique AugNN solutions NU : number of desired unique solutions (total) Hybrid Algorithm Step 1: Run one generation of the genetic algorithm as described. Step 2: Develop a set of AugNN weight s which replicates the best ad schedule discovered by the GA

PAGE 93

83 Step 3: Run the AugNN until the number of unique AugNN solutions NUA. Step 4: If the AugNN process improves upon the current best solutions, feed the associated ad schedule as a string into the next population of the GA. Step 5: Repeat steps 1-4 until the number of unique solutions NU. 5.4.4 Parameter Selection As discussed by Aytug et al. [66], one of the biggest concerns with respect to the application of these black box type algorithm s, such as neural networks and genetic algorithms, is the absence of theoretical guidance with respect to the methods by which the parameter settings should be selecte d. The techniques are very powerful and extremely popular, but their effectiveness may va ry considerably based on the ability to find a good set of parameter settings for the numerous algorithmic parameters and settings for each technique. For the GA ba sed methods, these include population size, mutation probability and elite list percen tage. For the AugNN based methods, these include the learning rate, backtracking fact or, reinforcement factor, learning rate multiplicative factor and the number of backtracks allowed. For a more detailed explanation of each of these paramete rs, please see appendix A. In developing a method of parameter sele ction, researchers ar e often enticed to utilize the widely criticized practice of parameter tuning. In doing so, they tune the parameters to different settings for each problem set. This technique may provide improved results, but is often impractical in industry and also bri ngs into question any assumptions that are made concerning the gene ralizability of the technique. We avoid this practice. Alterna tively, in an effort to gain a better understanding of the robustness of each of the proposed techniques for the problem s introduced, we maintain a consistent set of parameter settings across all of the pr oblem sets for each of the three problem

PAGE 94

84 instances. In determining which parameter setting to use for each problem instance, we use prior applications of the techniques to provide guidance, and then select a good set of parameter settings for our techniques based on a se ries of pilot runs. For each of the three problem instances, our pilot runs consiste d of 54 problems, 2 problems arbitrarily selected from each size of problems as describe d in the next section. The final parameter sets which were utilized for the pr oject are described in Tables 2-4. Table 2: AugNN Parameter Values Problem Unique Sol LR BF NBA RF SF MMS 300 0.003 8 7 3 10 MMSwAT 300 0.001 8 5 2 10 MMSwNLP 300 0.05 8 5 2 10 Table 3: GA Parameter Values Problem Uniq Soln PS Mut Prob Elit %Cross Limit MMS 300 80 0.05 0.25 400 MMSwAT 300 80 0.05 0.35 400 MMSwNLP 300 80 0.1 0.1 400 Table 4: Hybrid Parameter Values Problem Uniq GA Soln PS Mut Prob Elit %Cross Lim AugNN LR AugNN Uniq Soln SF MMS 300 40 0.05 0.25 400 0.01 150 5 MMSwAT 300 40 0.05 0.35 400 0.005 150 5 MMSwNLP 300 40 0.1 0.1 400 0.05 150 5 5.5 Problem set Development For each of the three problems, we needed a good set of test problems. We wanted to develop a strong set of problems which woul d give us, and researchers that follow, the opportunity to evaluate the re lative effectiveness and scal ability of proposed solution methodologies. To achieve this goal, for each of the three problems, we created 27 problem sets of different sizes and difficulties Each problem set, which consists of 10

PAGE 95

85 individual problems, has a predetermined number of time slots N which are of a predetermined size S If we follow the common preced ence of prior researchers and assume that the ads are flipped once per mi nute, the planning horiz on covered by our test problems ranges from a half of an hour to an entire day. Prior work on the Maxspace problem had limited the planning horizon to 100 minutes. We chose to expand this horizon in an effort to appeal to a larger set of publishers. The size is and frequency bounds iiUandL of ad iA in any test problem are randomly generated and have values which vary uniformly between S /3 and 2 S /3, where S is the size of the time slots for that particular problem. In their work on the Ma xspace problem, Kumar et al. [6] discovered that the utilization of this method for ad sizes fosters the creation of more difficult problems; therefore, we also employ it in ou r work. For the Modified Maxspace with Ad Targeting problem set, each time slot is assi gned a target id between 1 and 3. Similarly each advertisement is also randomly assi gned a target id between 1 and 4. An advertisement which is assigned a target id of four is considered to be an untargeted ad which can be served in any available time slot All of the other ads are targeted and can only be served to time slots which match their target id.

PAGE 96

86 CHAPTER 6 RESULTS In this chapter we provide the results of our empirical tests for both the IR Based Ad Targeting and the Online Advertisement Sche duling sections of the project. 6.1 Information Retrieval Based Ad Targeting Results In this section, we report the results of our information retrieval based ad targeting experiments. Each user was asked to rank their level of interest on a sc ale of 1 to 5 for a set of ads, some of which were selected random ly and the remainder of which was selected based on one of the three weighting schemes. As discussed in section 5.2, within the framework of our ad targeting process, we te sted three different weighting schemes in an effort to identify the best html structural element weighting combination. The tested schemes are detailed in Table 1, section 5.2. The relative effectiveness of each of the advertisement selection methods was determined based on the mean score of the user rankings for the associated set of ads. Since the underlying structure of the ranking scale is such that a higher score indicates an increased level of interest, we are assuming that a method which selects a group of ads which have a higher mean user ranking score is more effective than the alternative. Throughout this section, we use the unpaired T test to evaluate the statisti cal significance of the difference in means of user rankings betw een sets of selected ads. An important assumption of the T test is that the dependent variable is normally distributed. In our analysis the student rankings represents the dependent variable and based on the results of the Q-Q plot, which is given in Figure 12, we are relatively confident that this assumption

PAGE 97

87 has been met. In addition, utilization of the T test requires a careful analysis of the respective variances of the compared data sets. We utilized Levenes test of equal variances for this part of the analysis. If the significan ce level of the Levenes te st is greater than or equal to .05, the equal variances assumed ro w of the table is app licable; otherwise, the equal variances not assumed row must be used to determine the significance of the associated T test. Normal Q-Q Plot of Student RatingsObserved Value6 5 4 3 2 1 0Expected Normal Value5 4 3 2 1 Figure12: Q-Q Plot of Student Response Values We first compare the effectiv eness of the proposed IR based targeting method with a random selection process. The output of the IR Based Ad Targeting model is a set of weights/scores (please see section 5.2 fo r a more detailed discussion of how the weights/scores are assigned), one of which is assigned to each advertisement in the corpus. Based on the model design, a higher weight imp lies a greater fit betw een the given product and the interests of the respective user; theref ore, the ads are served to a given user in

PAGE 98

88 descending order of their weight/score. We acknowledge that depending on the size of a publishers ad corpus and the leng th of surfing time for a partic ular user, the percentage of ads which may be served to a user will vary; however, for this part or our experiment we assume that each user is served exactly 20% of our advertisement corpus which consists of 100 ads. Based on this assumption, the student rankings for the top 20 ads selected by each of the IR methods (one set for each weighting scheme) are compared with the rankings for 20 randomly selected ads. Table 5 below pr ovides a summary of the mean student rank values for each of the ad selection methods. Th e detailed T test results for this analysis can be found in Tables 6-8. Table 5: Summary of Mean Student Rankings for the 4 Selection Methods Ad Selection Method Mean Student Ranking IR Based Method with Weighting Scheme 2 2.71 IR Based Method with Weighting Scheme 1 2.69 IR Based Method with Weighting Scheme 3 2.63 Random Selection Method 2.50

PAGE 99

89Table 6: T Test-Scheme 1 & Random Selection Group Statistics N Mean Std. Dev Std. Error Scheme 1 943 2.69 1.388 0.045 Random 853 2.50 1.361 0.047 Independent Samples Test Levene's TestResults t-test for Equality of Means F Sig. t df Sig. (2tailed) Mean Diff Std. Err 95% Conf Interval Lower Upper Equal var assumed 0.427 0.51 2.954 1794 0.003 ** 0.19 0.065 0.065 0.319 Equal var not assumed 2.957 1782 0.003 0.19 0.065 0.065 0.319 Table 7: T Test-Scheme 2 & Random Selection Group Statistics N Mean Std. Dev Std. Error Scheme 2 942 2.71 1.394 0.045 Random 853 2.50 1.361 0.047 Independent Samples Test Levene's TestResults t-test for Equality of Means F Sig. t df Sig. (2tailed) Mean Diff Std. Error Diff 95% Conf Interval Lower Upper Equal var assumed 0.466 0.50 3.252 1793 0.001 ** 0.21 0.065 0.084 0.340 Equal var not assumed 3.255 1783 0.001 0.21 0.065 0.084 0.339

PAGE 100

90Table 8: T Test-Scheme 3 & Random Selection Group Statistics N Mean Std. Dev Std. Error Scheme 3 938 2.63 1.381 0.045 Random 853 2.50 1.361 0.047 Independent Samples Test Levene's Test Results t-test for Equality of Means F Sig. t df Sig. (2tailed) Mean Diff Std. Error Diff Equal var assumed 0.272 0.60 2.062 1789 0.039 **0.13 0.065 Equal var not assumed 2.064 1777 0.039 0.13 0.065

PAGE 101

91 From these results, it is clear that the mean student rankings from each of the three weighting schemes is greater than the mean rankings of randomly served ads, at a significance level of .039 or less in each case. This outcome supports hypothesis 1 presented in chapter 5. In an effort to furthe r evaluate the effectiveness of the IR based ad targeting method, we subdivided the set of ad s into groups of 20 based on the relative weight/score which was assigned by the IR based process. The top 20 ads were assigned an id of 5. The next 20 highest ranked ads were assigned an id of 4, and so on, with the lowest 20 ranked ads being assigned an id of 1. Table 9 provides a summary of the mean values of each weighting scheme/ad subset. Table 9: Summary of Mean Student Ra nkings for the Three Weighting Schemes Scheme 1 Scheme 2 Scheme 3 Ad Group Mean Student Ranking Mean Student RankingMean Student Ranking 1 1.99 2.01 2.09 2 2.38 2.40 2.32 3 2.37 2.29 2.38 4 2.54 2.52 2.64 5 2.69 2.71 2.63 The purpose of this part of the analysis is to determine, within each weighting scheme, whether the IR ad targ eting methodology is successful in se gmenting the corpus of ads into groups which have statistically significant di fferences in their mean rankings. If the methodology was ineffective, we would see mean student rankings which are essentially the same for the top 20 ads, the next 20 ads, and so forth. The detailed T test results for this analysis can be found in Tables 10-21. The resu lts indicate that, in the majority of cases, the mean rankings between groups are significantly different.

PAGE 102

92Table 10: T Test-Scheme 1-5 & Scheme 1-1 Group Statistics N Mean Std. Dev Std. Error Scheme 1-5 943 2.69 1.388 0.045 Scheme 1-1 171 1.99 1.234 0.094 Independent Samples Test Levene's Test Results t-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean Diff Std. Error Diff 95% Conf Interval Lower Upper Equal var assumed 15.593 0. 00 6.142 1112 0.000 0.70 0.114 0.475 0.920 Equal var not assumed 6.662 254.51 0.000 ** 0.70 0.105 0.491 0.903 Table 11: T Test-Scheme 1-5 & Scheme 1-2 Group Statistics N Mean Std. Dev Std. Error Scheme 1-5 943 2.69 1.388 0.045 Scheme 1-2 191 2.38 1.386 0.100 Independent Samples Test Levene's Test Resultst-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean Diff Std. Error Diff 95% Conf Interval Lower Upper Equal var assumed 0.043 0.84 2.856 1132 0.004 ** 0.31 0.110 0.098 0.530 Equal var not assumed 2.859 272.81 0.005 0.31 0.110 0.098 0.531

PAGE 103

93Table 12: T Test-Scheme 1-5 & Scheme 1-3 Group Statistics N Mean Std. Dev Std. Error Scheme 1-5 943 2.69 1.388 0.045 Scheme 1-3 189 2.37 1.255 0.091 Independent Samples Test Levene's Test Results t-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean Diff Std. Error Diff 95% Conf Interval Lower Upper Equal var assumed 4.754 0. 03 2.947 1130 0.003 0.32 0.109 0.107 0.535 Equal var not assumed 3.152 288. 06 0.002 ** 0.32 0.102 0.121 0.522 Table 13: T Test-Scheme 1-5 & Scheme 1-4 Group Statistics N Mean Std. Dev Std. Error Scheme 1-5 943 2.69 1.388 0.045 Scheme 1-4 310 2.54 1.326 0.075 Independent Samples Test Levene's Test Resultst-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean Diff Std. Error Diff 95% Conf Interval Lower Upper Equal var assumed 2.757 0.10 1.699 1251 0.090 ** 0.15 0.090 -0.024 0.329 Equal var not assumed 1.739 548.53 0.083 0.15 0.088 -0.020 0.325

PAGE 104

94Table 14: T Test-Scheme 2-5 & Scheme 2-1 Group Statistics N Mean Std. Dev Std. Error Scheme 2-5 942 2.71 1.394 0.045 Scheme 2-1 175 2.01 1.246 0.094 Independent Samples Test Levene's Test Resultst-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean Diff Std. Error Diff 95% Conf Interval Lower Upper Equal var assumed 12.818 0. 00 6.199 1115 0.000 0.70 0.113 0.478 0.921 Equal var not assumed 6.695 261. 71 0.000 ** 0.70 0.105 0.494 0.906 Table 15: T Test-Scheme 2-5 & Scheme 2-2 Group Statistics N Mean Std. Dev Std. Error Scheme 2-5 942 2.71 1.394 0.045 Scheme 2-2 188 2.40 1.378 0.101 Independent Samples Test Levene's Test Resultst-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean Diff Std. Error Diff 95% Conf Interval Lower Upper Equal var assumed 0.176 0.67 2.763 1128 0.006 ** 0.31 0.111 0.089 0.525 Equal var not assumed 2.783 268.86 0.006 0.31 0.110 0.090 0.524

PAGE 105

95Table 16: T Test-Scheme 2-5 & Scheme 2-3 Group Statistics N Mean Std. Dev Std. Error Scheme 2-5 942 2.71 1.394 0.045 Scheme 2-3 202 2.29 1.245 0.088 Independent Samples Test Levene's Test Results t-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean Diff Std. Error Diff 95% Conf Interval Lower Upper Equal var assumed 6.927 0. 01 3.997 1142 0.000 0.42 0.106 0.216 0.632 Equal var not assumed 4.300 318. 69 0.000 ** 0.42 0.099 0.230 0.618 Table 17: T Test-Scheme 2-5 & Scheme 2-4 Group Statistics N Mean Std. Dev Std. Error Scheme 2-5 942 2.71 1.394 0.045 Scheme 2-4 297 2.52 1.305 0.076 Independent Samples Test Levene's Test Results t-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean Diff Std. Error Diff 95% Conf Interval Lower Upper Equal var assumed 3.586 0.06 2.110 1237 0.035 ** 0.19 0.091 0.013 0.372 Equal var not assumed 2.183 525.71 0.029 0.19 0.088 0.019 0.366

PAGE 106

96Table 18: T Test-Scheme 3-5 & Scheme 3-1 Group Statistics N Mean Std. Dev Std. Error Scheme 3-5 938 2.63 1.381 0.045 Scheme 3-1 180 2.09 1.296 0.097 Independent Samples Test Levene's Test Results t-test for Equality of Means F Sig. t df Sig. (2tailed) Mean Diff Std. Error Diff 95% Conf Interval Lower Upper Equal var assumed 6.677 0. 01 4.890 1116 0.000 0.54 0.111 0.326 0.763 Equal var not assumed 5.107 263.18 0.000** 0.54 0.107 0.334 0.754 Table 19: T Test-Scheme 3-5 & Scheme 3-2 Group Statistics N Mean Std. Dev Std. Error Scheme 3-5 938 2.63 1.381 0.045 Scheme 3-2 180 2.32 1.319 0.098 Independent Samples Test Levene's Test Resultst-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean Diff Std. Error Diff 95% Conf Interval Lower Upper Equal var assumed 1.894 0.17 2.787 1116 0.005 ** 0.31 0.112 0.092 0.530 Equal var not assumed 2.876 260.10 0.004 0.31 0.108 0.098 0.524

PAGE 107

97Table 20: T Test-Scheme 3-5 & Scheme 3-3 Group Statistics N Mean Std. Dev Std. Error Scheme 3-5 938 2.63 1.381 0.045 Scheme 3-3 175 2.38 1.303 0.098 Independent Samples Test Levene's Test Results t-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean Diff Std. Error Diff 95% Conf Interval Lower Upper Equal var assumed 1.582 0.21 2.221 1111 0.027 ** 0.25 0.113 0.029 0.472 Equal var not assumed 2.312 252.57 0.022 0.25 0.108 0.037 0.464 Table 21: T Test-Scheme 3-5 & Scheme 3-4 Group Statistics N Mean Std. Dev Std. Error Scheme 3-5 938 2.63 1.381 0.045 Scheme 3-4 331 2.64 1.351 0.074 Independent Samples Test Levene's Test Results t-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean Diff Std. Error Diff 95% Conf Interval Lower Upper Equal var assumed 0.531 0.47 -0.530 1267 0.597 ** -0.01 0.088 -0.219 0.126 Equal var not assumed -0. 535 590.21 0.593 -0.01 0.087 -0.217 0.124

PAGE 108

98 The final focus of our analysis for this porti on of the project is to perform a detailed comparison of the results for the three differe nt weighting schemes employed within the framework of our IR based ad targeting model. Although there are many different sections which compose an html document, based on a detailed analysis of the raw html code of several documents, we have chosen to focu s this set of experiments on three primary sections; the body, the title and the keywords s ection. Recall that hypothesis 2 from chapter 5 posited that the keyword section of the ht ml documents will provide more information than the other sections within our model. By using different weights combinations for these html sections we are hoping that the asso ciated results may help us improve our understanding of the level of importance and info rmational value of these different structural elements. This in turn will hopefully allow us to improve overall model performance by pointing us to a good set of weights. For this series of tests, we once again analyzed the mean student responses for the top 20 ads as selected by each of the weighting schemes. The detailed T test results can be found in tables 22-24.

PAGE 109

99Table 22: T Test-Scheme 1 & Scheme 2 Group Statistics N Mean Std. Dev Std. Error Scheme 1 943 2.69 1.388 0.045 Scheme 2 942 2.71 1.394 0.045 Independent Samples Test Levene's Test Results t-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean Diff Std. Error Diff95% Conf Interval LowerUpper Equal var assumed 0.001 0.97 -0.310 1881 0.757 ** -0.02 0.064 0.146 0.106 Equal var not assumed -0.310 1880.950.757 -0.02 0.064 0.146 0.106 Table 23: T Test-Scheme 1 & Scheme 3 Group Statistics N Mean Std. Dev Std. Error Scheme 1 943 2.69 1.388 0.045 Scheme 3 938 2.63 1.381 0.045 Independent Samples Test Levene's Test Results t-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean Diff Std. Error Diff95% Conf Interval LowerUpper Equal var assumed 0.017 0.89 0.911 1877 0.362 ** 0.06 0.064 0.067 0.183 Equal var not assumed 0.911 1877.000.362 0.06 0.064 0.067 0.183

PAGE 110

100Table 24: T Test-Scheme 2 & Scheme 3 Group Statistics N Mean Std. Dev Std. Error Scheme 2-5 942 2.71 1.394 0.045 Scheme 3-5 938 2.63 1.381 0.045 Independent Samples Test Levene's Test Results t-test for Equality of Means F Sig. t df Sig. (2-tailed) Mean DiffStd. Error Diff95% Conf Interval LowerUpper Equal var assumed 0.027 0.87 1.219 1876 0.223 ** 0.08 0.064 0.047 0.204 Equal var not assumed 1.219 1875.960.223 0.08 0.064 0.047 0.204 The results of this analysis, along with an expanded explanat ion of the prior analyses, are presented in the next section.

PAGE 111

101 6.2 Discussion of the Information Re trieval Based Ad Targeting Results Hypothesis 1 states that the IR based ad se lection method will be more effective than the commonly used random model in selec ting a subset of targeted ads from a given advertisement corpus with respect to the level of interest to a given us er. This hypothesis is supported by the results which are detailed in Tables 5-8. With respect to the chosen evaluation measure, the IR based selecti on process outperformed the random selection process with all three weighti ng schemes. All of the associated T tests are statistically significant at the .05 level. These results are very pr omising, but we wanted to push our analysis of the IR based techni que a little further. Since th e initial experiments focused only on the top 20% of the ads we next turned our focus to the other 80%. Given that the IR based technique assigns a relevance weight/score to every advertisement we were anxious to see how the technique performed on those ads that did not fall at the t op. As is indicated by the summarized mean scores in table 9, the te chnique once again performed relatively well. The overall trend of student rank responses was very consistent with the weights assigned by the IR based model. Hypothesis 2 states that the keyword secti on of the html documents will provide more information than the other sections within our model. We tested th is hypothesis through the utilization of three different weighting schemes within the IR based ad targeting model. We had hoped that the performance of the IR based method with the different weighting schemes would provide the support needed to prove this hypothesis; however, unfortunately it did not. Although there seems to be a good case for the keyword, title and body sections being ranked in that order with respect to th eir informational value, the resulting T tests were not significant at our chosen level of .05; therefore, we ar e not able to, at this time, draw any definitive conclusions. However, we do feel that the results warrant a motivation

PAGE 112

102 of future focus and research. We are optimistic that the testing of additional weighting schemes which place the entire weight on indi vidual sections of the html documents will provide us with the conclusive eviden ce needed to support hypothesis #2. 6.3 Online Advertisement Scheduling Results The proposed heuristic and meta heuristic approaches were applied to each data set and in this chapter their performances are compar ed in an effort to evaluate their relative effectiveness and scalability for each of the th ree problems. Aytug et al. [66], and others, have noted that when developing solution t echniques for NP-hard optimization problems, such as those which we have proposed, it is ve ry beneficial and revealing to compare the performance of the proposed technique to that of the most powerful techniques which have been discovered for that particular problem. Unfortunately, since th ese are new problems, we do not have this opportunity. However, we will make our data sets publicly available and we are hopeful that other researchers will use them in an effort to further extend our collective effort to find improved approxima tion algorithms for the proposed problems and in doing so provide a better benchmark for our proposed methods. All of the computational studies were performed on a Pe ntium 4 computer which has a 3.6 Ghz processor, 1 GB of Ram and a windows XP operating system. The results for the Modified Maxspace Problem, the Modified Maxspace Problem with Ad Targeting and the Modified Maxs pace Problem with Nonlinear Pricing are presented in tables 25, 26 and 27 respectively. Each line in the tables represents the 10 problems that were randomly created for that particular combination of N and S The % Gap values as reported in th e tables represent the associated techniques percentage deviation for the upper bound. The upper bound is se t equal to the total volume of available

PAGE 113

103 advertising space (ie. N X S ). Using the MMS problem as an example, the % Gap LVLF is calculated as follows: % Gap LVLF = [(UB LVLF) / UB] x 100 In addition, we also provide the percentage of deviation of the thr ee metaheuristics over the base heuristic for the particular problem. Once again using the MMS problem as an example, % Imp in Avg Gap of AugNN over LVLF is calculated as follows: % Imp in Avg Gap of AugNN over LVLF = [(% Gap LVLF %Gap AugNN) / (% Gap LVLF)] x 100 Similar calculations are made for the other pr oblems for each metaheuristic. Columns 11 14 of each table report the average CPU time in seconds for each of the tested techniques. For the metaheuristics, if all 10 problems of a given problem set were solved in less than one second, the reported time is 0 seconds. Since th e base heuristics solved every problem in less than one second, we have simply reported the associated CPU time as < 1.

PAGE 114

104 6.2.1 Modified Maxspace (MMS) Problem Result Table 25: Problem Results Problem Set # No. of Slots (N) Size of Each Slot (S) % Gap LVLF % Gap AugNN % Gap GA % Gap Hybrid Avg Avg Avg Avg % Imp in Avg Gap of AugNN over LVLF % Imp in Avg Gap of GA over LVLF % Imp in Avg Gap of Hybrid over LVLF Avg CPU time for LVLF Avg CPU Time (sec) for AugNN Avg CPU Time (sec) for GA Avg CPU Time (sec) for Hybrid 1 30 25 9.25 5.31 3.31 3.27 42. 65 64.27 64.70 < 1 0 0 0.00 2 30 50 12.11 4.55 3.11 2.90 62. 39 74.34 76.05 < 1 0 0 0.00 3 30 75 15.68 5.36 4.37 3.44 65. 82 72.11 78.09 < 1 0 0 0.00 4 60 25 7.47 2.07 0.92 0.53 72. 35 87.69 92.86 < 1 0 0 0.10 5 60 50 13.25 3.47 1.59 1.22 73. 79 88.00 90.82 < 1 0 0 0.00 6 60 75 12.46 3.76 2.50 1.49 69. 84 79.92 88.01 < 1 0 0 0.00 7 90 25 5.90 1.96 0.29 0.41 66. 77 95.03 92.99 < 1 0.4 0 1.00 8 90 50 6.87 3.17 0.51 0.33 53. 80 92.59 95.18 < 1 0.6 0 1.00 9 90 75 10.49 3.80 0.76 0.42 63. 82 92.79 96.01 < 1 0.7 0 1.00 10 120 25 8.38 1.74 0.05 0.20 79. 19 99.40 97.57 < 1 1 0 2.00 11 120 50 9.25 2.48 0.57 0.26 73. 18 93.80 97.19 < 1 1 0 1.90 12 120 75 8.92 2.16 0.63 0.45 75. 76 92.91 94.92 < 1 1.4 0 2.00 13 150 25 6.29 0.73 0.50 0.14 88. 34 92.07 97.79 < 1 1.1 0 3.20 14 150 50 4.39 1.92 0.71 0.20 56. 14 83.86 95.38 < 1 1.1 0 3.20 15 150 75 6.01 1.33 0.84 0.35 77. 82 85.97 94.21 < 1 1.8 0 3.20 16 180 25 4.08 0.79 0.18 0.11 80.69 95.70 97.33 < 1 1.7 0.5 5.10 17 180 50 7.29 1.19 0.48 0.10 83.70 93.46 98.69 < 1 2.1 0.6 5.10 18 180 75 5.81 1.62 0.55 0.26 72.04 90.47 95.54 < 1 2.4 0.5 5.20 19 360 25 3.66 1.11 0.44 0.00 69.77 88.06 100.00 < 1 5.5 2.8 17.70 20 360 50 2.38 1.63 0.61 0.00 31. 55 74.38 100.00 < 1 6 2.6 17.40 21 360 75 3.67 0.79 0.38 0.05 78.62 89.70 98.65 < 1 6.4 2.6 17.50 22 720 25 1.13 1.12 0.42 0.00 0.20 62.95 100.00 < 1 22.6 13.4 74.00 23 720 50 5.16 3.21 0.90 0.00 37.86 82.62 100.00 < 1 25.3 13.6 73.90 24 720 75 5.07 0.98 0.71 0.00 80.71 85.94 100.00 < 1 24.2 13.1 73.90 25 1440 25 3.50 1.36 0.60 0.00 61.08 82.74 100.00 < 1 84.9 54.8 297.30

PAGE 115

105Table 25 Continued Problem Set # No. of Slots (N) Size of Each Slot (S) % Gap LVLF % Gap AugNN % Gap GA % Gap Hybrid % Imp in Avg Gap of AugNN over LVLF % Imp in Avg Gap of GA over LVLF % Imp in Avg Gap of Hybrid over LVLF Avg CPU time for LVLF Avg CPU Time (sec) for AugNN Avg CPU Time (sec) for GA Avg CPU Time (sec) for Hybrid 26 1440 50 8.46 3.46 0.50 0.00 59.09 94.14 100.00 < 1 90.3 54.3 299.60 27 1440 75 4.38 0.89 0.85 0.03 79.59 80.59 99.32 < 1 109.8 56.4 298.00 AVG 7.08 2.30 1.01 0.60 65.06 85.76 94.12 < 1 14.46 7.97 44.57 6.2.2 The Modified Maxspace with Ad Ta rgeting (MMSwAT) Problem Results Table 26: MMSwAT Comparison of Results Problem Set # No. of Slots (N) Size of Each Slot (S) % Gap Subset LVLF % Gap AugNN % GA Gap % Gap Hybrid Avg Avg Avg Avg % Imp in Avg Gap of AugNN over LVLF % Imp in Avg Gap of GA over LVLF % Imp in Avg Gap of Hybrid over LVLF Avg CPU time for LVLF Avg CPU Time (sec) for AugNN Avg CPU Time (sec) for GA Avg CPU Time (sec) for Hybrid 1 30 25 27.21 22.97 5.19 5.19 15. 58 80.94 80.94 <1 0.2 0 0.40 2 30 50 30.80 25.61 6.76 6.75 16. 84 78.05 78.10 <1 0 0 0.40 3 30 75 31.07 25.70 6.84 5.68 17. 29 77.99 81.72 <1 0.1 0 0.30 4 60 25 18.31 10.71 1.72 0.93 41. 48 90.60 94.94 <1 2.1 0 3.40 5 60 50 19.94 8.26 2.28 1.46 58. 57 88.58 92.66 <1 2.2 0 2.70 6 60 75 20.60 8.39 2.36 2.02 59. 25 88.54 90.18 <1 2.2 0 2.60 7 90 25 22.28 11.62 2.13 1.66 47.83 90.42 92.54 <1 4.1 0.2 4.80 8 90 50 24.83 9.56 1.53 1.90 61.48 93.85 92.35 <1 3.7 0.2 4.90 9 90 75 23.02 9.27 2.20 2.65 59.74 90.46 88.51 <1 3.7 0.2 4.90 10 120 25 13.51 6.96 0.47 0.13 48.50 96.52 99.06 <1 7.6 1.4 10.30 11 120 50 12.66 6.02 0.50 0.37 52.47 96.09 97.08 <1 7.4 1.4 10.20 12 120 75 13.77 6.14 1.05 0.91 55.38 92.36 93.41 <1 7.6 1.5 10.30

PAGE 116

106Table 26 Continued Problem Set # No. of Slots (N) Size of Each Slot (S) % Gap Subset LVLF % Gap AugNN % GA Gap % Gap Hybrid % Imp in Avg Gap of AugNN over LVLF % Imp in Avg Gap of GA over LVLF % Imp in Avg Gap of Hybrid over LVLF Avg CPU time for LVLF Avg CPU Time (sec) for AugNN Avg CPU Time (sec) for GA Avg CPU Time (sec) for Hybrid 13 150 25 12.69 6.83 0.02 0.41 46.20 99.83 96.76 <1 11.1 2.8 16.60 14 150 50 14.84 5.72 0.70 0.93 61.44 95.30 93.76 <1 12.2 2.6 17.10 15 150 75 15.10 5.82 0.52 0.70 61.46 96.54 95.35 <1 11.6 2.6 16.80 16 180 25 14.35 8.08 0.44 0.33 43.70 96.93 97.69 <1 15.3 3.3 22.30 17 180 50 14.87 5.56 0.48 0.99 62.63 96.75 93.35 <1 15.8 3.3 22.00 18 180 75 13.65 5.01 1.18 1.10 63.32 91.35 91.97 <1 16.5 3.3 22.70 19 360 25 12.59 6.01 0.26 0.16 52.27 97.94 98.71 <1 36.7 13.5 84.00 20 360 50 9.40 4.00 0.38 0.25 57.44 95.99 97.35 <1 43.9 13.1 86.60 21 360 75 14.06 3.73 0.82 0.47 73. 47 94.16 96.66 <1 50 13.3 89.10 22 720 25 9.15 4.44 0.01 0.11 51.45 99.85 98.83 <1 153.4 63.6 435.60 23 720 50 8.58 2.75 0.09 0.36 67. 94 98.95 95.85 <1 127 63.7 396.80 24 720 75 8.77 2.44 0.21 0.60 72.24 97.63 93.15 <1 144.9 63.7 397.20 25 1440 25 4.86 4.17 0.18 0.25 14.21 96.28 94.77 <1 441.7 264.5 1586.40 26 1440 50 7.83 2.63 0.16 0.40 66.44 97.91 94.93 <1 503.9 262.8 1465.70 27 1440 75 6.03 2.16 0.39 0.32 64.20 93.51 94.69 <1 621.7 260.6 1464.70 AVG 15.73 8.17 1.44 1.37 51.59 93.09 93.16 < 1 83.21 38.58 228.84

PAGE 117

107 6.2.3 The Modified Maxspace wint Nonlinea r Pricing (MMSwNLP) Problem Results Table 27: MMSwNLP Comparison of Results Problem Set # No. of Slots (N) Size of Each Slot (S) % Gap LVLF % Gap AugNN % Gap GA % Gap Hybrid Avg Avg Avg Avg % Imp in Avg Gap of AugNN over LVLF % Imp in Avg Gap of GA over LVLF % Imp in Avg Gap of Hybrid over LVLF Avg CPU time for LVLF Avg CPU Time (sec) for AugNN Avg CPU Time (sec) for GA Avg CPU Time (sec) for Hybrid 1 30 25 19.72 6.17 5.87 4.95 68. 70 70.25 74.92 <1 0.1 1 1.00 2 30 50 21.79 5.45 4.47 4.10 74. 97 79.50 81.18 <1 0.7 1 1.20 3 30 75 23.87 5.73 4.84 4.43 75. 98 79.70 81.43 <1 0.1 1 1.00 4 60 25 15.33 3.83 2.85 1.15 75. 00 81.39 92.52 <1 0.8 2 2.10 5 60 50 22.23 3.78 3.59 2.59 83. 01 83.83 88.33 <1 0.8 2 2.00 6 60 75 25.62 4.42 3.77 2.96 82. 75 85.30 88.46 <1 0.6 2 2.00 7 90 25 17.00 3.95 2.74 1.57 76.75 83.89 90.74 <1 1.5 3.3 3.80 8 90 50 19.64 5.11 2.24 1.39 73.98 88.58 92.94 <1 1.6 3.3 3.40 9 90 75 22.90 4.63 2.36 1.27 79.77 89.68 94.46 <1 1.8 3.2 3.30 10 120 25 15.63 2.62 1.12 0.65 83. 22 92.81 95.82 <1 2.4 5 5.20 11 120 50 17.87 3.48 1.87 1.67 80. 53 89.55 90.65 <1 2.6 5 5.20 12 120 75 18.66 3.86 2.27 1.50 79. 33 87.82 91.95 <1 2.6 5 5.20 13 150 25 15.19 3.51 1.73 1.58 76.87 88.62 89.57 <1 3.6 6.3 7.20 14 150 50 15.92 3.25 2.13 1.60 79. 56 86.64 89.94 <1 3.4 6 7.10 15 150 75 16.95 3.68 2.07 1.63 78. 29 87.76 90.35 <1 3.2 6 7.10 16 180 25 17.02 3.19 1.66 1.52 81. 24 90.22 91.10 <1 4.5 8 9.70 17 180 50 15.97 3.90 1.67 1.61 75.59 89.52 89.95 <1 4.5 7.9 9.90 18 180 75 17.01 3.62 1.95 1.70 78. 73 88.53 90.03 <1 5.7 8 10.30 19 360 25 17.12 3.63 1.25 1.02 78.79 92.71 94.03 <1 17.3 18.5 27.70 20 360 50 17.10 4.53 2.00 1.54 73.50 88.28 90.99 <1 15.8 17.7 28.00 21 360 75 18.64 4.71 2.38 1.51 74. 76 87.23 91.91 <1 14 17.9 26.70 22 720 25 11.49 4.27 4.69 3.40 62.79 59.21 70.44 <1 22.6 40.2 69.60 23 720 50 10.89 4.81 4.35 3.90 55. 83 60.05 64.23 <1 23.3 40 66.70

PAGE 118

108Table 27 Continued Problem Set # No. of Slots (N) Size of Each Slot (S) % Gap LVLF % Gap AugNN % Gap GA % Gap Hybrid % Imp in Avg Gap of AugNN over LVLF % Imp in Avg Gap of GA over LVLF % Imp in Avg Gap of Hybrid over LVLF Avg CPU time for LVLF Avg CPU Time (sec) for AugNN Avg CPU Time (sec) for GA Avg CPU Time (sec) for Hybrid 24 720 75 12.63 5.21 5.08 4.33 58.74 59.77 65.73 <1 25.3 39.6 69.80 25 1440 25 12.06 2.64 2.29 1.88 78.08 81.01 84.44 <1 107.1 109.5 323.10 26 1440 50 16.90 2.36 1.77 2.06 86. 02 89.50 87.83 <1 114.9 99 322.70 27 1440 75 13.39 3.19 2.83 2.51 76.15 78.83 81.25 <1 97.7 99.5 341.90 AVG 17.35 4.06 2.81 2.22 75.89 82.97 86.49 <1 17.72 20.66 50.48

PAGE 119

109 6.3 Discussion of the Online Ad vertisement Scheduling Results As anticipated, for each of the three problems, with one small exception, there is a clearly direct relationship between proces sing time and solution quality. The greedy heuristic was by far the fastest technique for each of the problem instances averaging less than one second of CPU pro cessing time for each problem, but it also provided the poorest results with an av erage % gap ranging from 7.08 % for the MMS problem to 17.35 % for the MMSwNLP problem. On the other end of the spectrum, the proposed Hybrid technique, which combined the GA and AugNN methods, provided the best solution quality for all three problem instances with an average % gap ranging from .60 % for the MMS problem to 2.22% for the MMSwNLP problem. However, the improved results do come with a higher processing co st. The average processing time for the Hybrid technique ranged from 44.57 CP U seconds for the MMS problem to 228.84 seconds for the MMSwAT problem. In the middle of this processing time / solution quality continuum, we find the AugNN and GA metaheuristics. When applied independently, for the MMS and MMSwAT prob lems, the GA proved to be better than the AugNN, providing slightly be tter results with lower associ ated average process times. However, for the MMSwNLP problem, the performance relationship between the AugNN and GA is not quite as clearly cut. The GA still provides slightly better solutions with an average gap of 2.81% versus the AugNN average gap of 4.06%, but the GA has a higher processing cost taking an average of 20.66 CPU seconds of processing time while the AugNN requires an average of only 17.72 seconds.

PAGE 120

110 CHAPTER 7 SUMMARY, CONCLUSIONS AND FUTURE RESEARCH This research consists of two implicitly connected projects; IR Based Ad Targeting and Online Ad Scheduling. We hope that ou r work on both will be beneficial to ad publishers and researchers in the future. It is widely accepted that attempting to estimate a consumers affinity for a particular product or service is extremely ch allenging and it also common knowledge that any improvement in our ability to do so can be extremely beneficial to many companies including online ad publishers. The first part of our research attempts to provide an alternative solution technique for this challe nging problem as it is faced by the online ad industry. We developed and proposed a me thodology that attempts to leverage a potential consumers Web surfing histor y, utilizing informa tion retrieval based techniques combined with a powerful lexical da tabase, in an effort to gain an improved understanding of a users product and service interests. In an effort to test the relative effectiveness of the technique, based on the an alysis, each user was served a series of advertisements for many different products and services and asked to rate their level of interest in each. The results were compared to a similar set of user evaluations of ads which were selected randomly. As a result of the difficulty of this process, random ad selection is a commonly used method in industr y. The results were very promising, with the proposed technique outperforming the alte rnative in each case. Consequently, if provided an opportunity to anal yze a users surfing behavior we feel that the proposed technique is certainly worthy of considerat ion and could be a viable ad selection

PAGE 121

111 technique that publishe rs might find very financially be neficial. Although the initial results for the proposed technique are very promising, several research extensions are possible. We limited the inherent analysis to only three sections of the html documents. We hope that in future research, an expanded analysis of additional sections of the html documents might lead to an even more power ful solution alternative. In addition, we hope to develop a future analysis which leads to more conclusive re sults with respect to the informational importance of the different ht ml sections. This will help users to focus in on a very good set of structural element weights. The second part of our project focuses on another very difficult problem which is faced by online ad publishers, the ad scheduling problem. Advertisement space is the most precious resource available to ad publishers ; therefore, they must make every effort to utilize it as efficiently as possible. We introduce three re al-world variations of the NPHard online ad scheduling problem which, to the best of our know ledge, are new to the academic literature. In addition, we propos e and test several alternative solution techniques for each. As is the case with most difficult optimization problems, a user who is attempting to tackle a problem is often faced with the difficult process of deciding which solution method is best fo r their particular situation. This, more times than not, requires them to consider foregoing solution qu ality in lieu of processing time, or vice versa. For the three proposed problem s, which are routinely faced by online advertisement publishers, our goal was to introduce the problems to the academic literature and to provide seve ral potential solution techniques for each. The alternative techniques have been tested against a large set of sample problems which vary in size and difficulty in an effort to provide potential users with a basic estimation of their

PAGE 122

112 performance potential. One contribution of our work is the introduction of the hybrid technique. Although many other hy brid techniques have been developed, to the best of our knowledge, this is the first time that the discussed augmented neural network has been combined with a genetic algorithm. This technique performed extremely well, dominating two very popular metaheuristic t echniques with respect to solution quality on all three problems. Based on its performance, in future research we plan to test this technique as an alternative solution method fo r other difficult optimization problems. In addition, from the results, we believe that we have also de monstrated that ad scheduling problems of realistic scope which incorpor ate realistic objective s, though NP-Hard, can be reasonably solved. Further, we believe that we have provided insight into the tradeoffs between solution quality and solu tion time for these problems for a range of possible heuristic and metaheuristic appr oaches (some of which were developed specifically for this research). We first hope that publishers find our work to be helpful, and secondly we hope that our work might motivate other researchers to develop and introduce additional alternative techniques for the proposed problems in the future.

PAGE 123

113 APPENDIX A GA AND AUGNN PARAMETER A ND SETTING DEFINITIONS GA: Population Size the number of chromo somes (solution strings) which are included in each population Selection Process process by which parent chromosomes are selected from the population Mutation Process process by which individual bits are to be selected as potential mutation candidates Mutation Probability the probability that a mutation candidate is mutated Crossover Process process by which parental chromosomes are combined to form the children chromosomes Crossover Probability the probability that two parent chromosomes will undergo reproduction Stopping Criteria criteria which determines how many generations are included in the GA run Coding Scheme process by which potential solutions are coded are coded as chromosome strings AugNN: Stopping Criteria criteria which determines how many interations are included in each AugNN run Learning Rate rate by which the AugNN we ights are modified from iteration to Iteration Backtracking Factor the number of iterations that are allowed without finding a better solution, after which the set of weights are reset to those th at were used to find the best solution thus far Number of Backtracks Allowed the number of backtracks that allowed without finding a better solution, afte r which the Learning Rate is increased Learning Rate Multiplicative Factor the rate by which the learning rate is increased if the number of allowed backtracks has been exceeded Reinforcement Factor rate by wh ich the set of weights are rein forced if a better solution is found with the last set of weight adjustments

PAGE 124

APPENDIX B LIST OF ADVERTISED PRODUCTS AN D SERVICES AND THEIR RESPECTIVE CHARACTERISTIC ARRAYS

PAGE 125

115Ad # Ad Name Keyword 1 Keyword 2 Keyword 3 Keyw ord 4 Keyword 5 Keyword 6 Keyword 7 Keyword 8 Keyword 9 Keyword 10 1 Dell computer pda computer dell dell apple pc laptop desktop gateway 2 Target target target clothes shoe electronic buy walmart bath shop shopping 3 Masters golf golf golf golf mickelson woods masters driver clark sport 4 Women's Shoes shoes shoe shoe shoe s hoe clothes business dress shop suit 5 Vera Bradley Bag bag bag purse pocketbook purse pocketbook pocketbook bradley bradley shop 6 Pennzoil oil oil car care oil lube lube oil oil oil 7 Nascar race race race racing racing goodyear race racing nextel cup 8 SUV truck automobile car truck ca r automobile truck travel loan auto 9 kohler faucets faucet bathroom toilet sink faucet remodel faucet sink bathroom faucet 10 Flat Screen TV television tv tv screen plasma movie sony sport football movie 11 Gators Football football football gator gator football sport athletic basketball gator gator 12 Computer computer computer window monitor dell dell pc apple college pc 13 Pedialyte infant dehydration hang over toddler baby dehydrate electrolyte hydrate infant 14 Gator Sport Shop gator gator gator florida gators florida florida clothing football short 15 Kodiak dip tobacco nicotine dip tobacco kodiak tobacco smoke dip smokeless 16 Cars car car automobile loan jeep car jeep auto automobile graduation 17 Nordstrom clothing jean shirt shoe gap pant shop purchase clothing clothing 18 flowers.com flower floral floral floral flower annivers ary birthday sick graduation party 19 HP computer printer personal com puter dell electronic dell pc pc pc 20 home depot depot floor home home depot repair lawn garden appliance home 21 Discover Card travel visa discover cr edit debt purchase finance card visa credit 22 Etrade stocks stock investment job sto ck trade bond stock investment investment 23 US Air hotel vacation trav el travel ticket delta fl y airline airline airline 24 USA Today news newspaper classified sport leisure news today life news weather 25 Orkin bug exterminate exterminator bug pest pest bug kill bug exterminate 26 Digital Camera camera camera olympus pict ure cannon camera picture digital vacation photo 27 Jewelry jewelry anniversary engagement jewelr y ring watch gift jewelry present birthday 28 pricegrabber.com shop shop price compare buy purchase buy price compare price 29 bed bath and beyond bathroom laundry beyond bed bath towel kitchen appliance home bedding 30 myspace.com social networking shari ng friend friend blog mu sic finding friend band 31 engagement ring marry fiance engagement engaged ma rriage ring wedding anniversary wife engagement

PAGE 126

11632 careerbuilder.com career employment salary car eer employee monster application resume employer job 33 Barnes and Noble book read novel book nonfiction book noble amazon million best 34 Abercrombie and Fitch jean gap clothes gap clothi ng shirt jean jacket short shop 35 Lenox China china glass lenox china dine plate china fine crystal lenox 36 eharmony.com date companion friend match date si ngle harmony relationship single compatibility 37 realtor.com condominium house sale realtor agent realtor sell estate realtor realtors 38 Harley Davidson bike harley honda hog motorc ycle softail poker ride motorcycle ride 39 AOL Travel vacation flight spring break trip cruise vacation vacation trip reservation 40 Mapquest direction travel vacation atlas st reet drive mapquest direction trip route 41 Michael's scrap scrap lobby hobby cra ft craft flower floral art art 42 ESPN sport score news basketball foot ball score classic insider sport score 43 Blockbuster online actor actress movie blockbuster gallery movi e game actress movie blockbuster 44 Shands Hospital sick broken injury health health care doctor nurse health care 45 Amazon.com book movie novel video amazon amazon music electronic electronic book 46 Trugreen lawn lawn yard care mow mower grass lawn yard yard 47 BabiesRUs toddler baby pregnant baby family child toddler infant baby child 48 NetJets flight trip jet luxury airli ne airplane travel meeting luxury private 49 Victoria's Secret lingerie secret romance annive rsary birthday gift bra li ngerie anniversary romance 50 Forbes.com business investment stock stock com porate investment investment business bond wealth 51 People celebrity star movie people m agazine gossip celebrity singer news photo 52 American Idol american fox television sing simon celebrity idol star music pickler 53 HGTV Newsletter home repair fix hom e depot bath house kitchen repair bathroom 54 Spysweeper virus virus pc memory computer security pc virus virus security 55 Moving company move moving shipping st orage job moving move pack job graduation 56 Sherwin Williams paint paint paint wall remodel repair home depot wall repair 57 Foreclosures (homes) home foreclosure tax debt purchase move mortgage tax foreclosure foreclosure 58 Camel Lights cigarette cancer nicotine tar alcohol drink party smoke habit sick 59 Playstation gaming play gaming gami ng video console play game gamer gamer 60 Ghost Recon Game gaming play gaming gaming video recon play game gamer gamer 61 Football Xbox Game football gaming athletic game gaming athletic gator game video sport 62 NHL Playofffs hockey ice ice hockey at hletic sport play ice hockey walker 63 NFL Draft football athletic gator meyer s port sport athletic draft football national

PAGE 127

11764 AARP health career elderly retirement health care doctor patient health sick 65 Facebook social network school welcom e group finding friend so cial friend school 66 Online College degree on line acc eptance school li ne on on phoenix phoenix 67 State Farm Insurance insurance life insurance insurance insure loss casualty health insurance insure 68 Baby Einstein infant baby infant family baby einstein cry baby infant gerber 69 Heinz ketchup heinz heinz heinz f ood condement ketchup heinz fries french 70 Omaha Steaks steak steak beef tenderloin steak dinner grill bake cook food 71 Addall book text class book text book textbook book novel amazon college 72 Goodyear tire tire automobile tire rim tire tire tire tire tire 73 John Deere lawn john deere mower tractor mower john deere yard lawn 74 Plastic Surgery beauty plastic surgery face beauty surgeon plastic plastic pretty health 75 Remington gun gun firearm rifle shot bullet knife hunt scope gun 76 Bosshardt Realty realty house rent realty move real estate estate real realtor 77 Gator Basketball basketball basketball spor t sport gator gator gator gator donovon noah 78 Beach Setting trip travel trip vacati on vacation vacation beach ocean break spring 79 Hertz rent rental rental car trav el rental travel rent rent car 80 Dynamic Playground gaming play gaming gaming video console play game gamer gamer 81 Playgrounds play swing swing slide play child play swing lawn children 82 FSBO sale for by owner sale home realty realty house for 83 Lending Tree loan mortgage loan mortgage mo rtgage loan realty loan finance mortgage 84 Ipod apple apple music apple musi c listen music apple music apple 85 Lucas Oil oil oil car care oil lube lube oil oil oil 86 Depends diaper depend diaper elderly bl adder bladder urine depend old elderly 87 Qray magnet health magnet bracelet br acelet golf magnet magnetic gold silver 88 Lennox air air condition c ool heating cooling air cooling conditioning lennox 89 Nike sport athletic athletic runni ng tennis nike shoe volleyball run nike 90 David's Bridal wedding dress gown marri age wedding dress marriage wedding marriage engagement 91 Mont Blanc pen writing pen instrument gift pen ink write writing ink 92 Futures Magazine future derivative derivativ e risk trade trade option option future derivative 93 Cell phone (razor) cell phone phone phone cellular number motorola cellular phone razor 94 iTunes music apple apple country appl e apple rock music music digital 95 Armorall auto tires tire clean car auto car clean clean protect

PAGE 128

11896 Ebay buy sell auction auction apparel buy purchase collectible shop auction 97 Pickalawyer.com lawyer lawyer lawyer attorney attorney litigation sue divorce court trial 98 Zip Drive storage storage computer com puter storage pc zip file drive drive 99 Playboy.com adult sex rack sex boob breast nude nudity adult sex 100 Ping golf driver ping golf taylor ball athletic golf iron play

PAGE 129

119 APPENDIX C SAMPLE DOCUMENTS FOR ONE USER FR OM THE IR BASED AD TARGETING PROCESS Appendix C1: Sample User Input File ================== ================== ============== URL : res: //C:\Program%20Files\AIM\WNDUTILS.dll/1960 Title : about:blank Hits : 84 Modified Date : 4/22/2006 9:22:20 PM Expiration Date : 5/3/2006 9:15:12 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://client.speedbit.com/DownloadPane /Default.aspx?Filenam e=aWVodi5leGU=&U=a HR0cDovL2JlYXIuY2JhLnVmbC5lZHUvcG F0aGFrL2V4dHJhX2NyZWRpdC9pZWh2 LmV4ZQ==&RF=aHR0cDovL2hwLm5ld HNjYXBlLmNvbS9ocC5hZHA=&L=429496 7295&LH=4294967295&K=3&INC=2&R= 0&V=8.0.4.4&LNG=1033&PCX=144&PC Y=120&HCS=&FCS=None Title : Hits : 1 Modified Date : 4/22/2006 9:21:42 PM Expiration Date : 5/3/2006 9:14:34 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://cli ent.speedbit.com/DownloadPane/Defaultp.aspx Title : Hits : 1 Modified Date : 4/22/2006 9:21:42 PM Expiration Date : 5/3/2006 9:14:34 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://hp.ne tscape.com/hp.adp Title : Netscape.com Hits : 16 Modified Date : 4/22/2006 9:21:35 PM

PAGE 130

120 Expiration Date : 5/3/2006 9:14:26 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://ie.redirect.hp.com/svs/rdr?TYPE =3&tp=iehome&locale=EN_US&c=Q405&bd=pa vilion&pf=laptop Title : Hits : 7 Modified Date : 4/22/2006 9:21:33 PM Expiration Date : 5/3/2006 9:14:24 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : file:///C:/Documents%20and%20Settings/Sea n/Desktop/DIS%20Extra%20Credit%20Pro ject%20-%20Instructions.doc Title : Hits : 1 Modified Date : 4/22/2006 9:21:05 PM Expiration Date : 5/3/2006 9:13:56 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : file:///C:/Progr am%20Files/Titan%20Poker/da ta/poker/chat/chat.html Title : C:\Program Files\Titan Poker\data\poker\chat\chat.html Hits : 12 Modified Date : 4/22/2006 9:00:11 PM Expiration Date : 5/3/2006 8:53:02 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : about:blank Title : Metalnyheter.se Hits : 45 Modified Date : 4/22/2006 9:00:11 PM Expiration Date : 5/3/2006 8:53:02 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://p romo.titanpoker.com/emails/client1/email.html

PAGE 131

121 Title : Titan Poker Hits : 1 Modified Date : 4/22/2006 8:58:52 PM Expiration Date : 5/3/2006 8:58:54 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://p romo.titanpoker.com/emails/client2/email.html Title : Titan Poker Hits : 3 Modified Date : 4/22/2006 8:58:52 PM Expiration Date : 5/3/2006 8:58:54 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://www.google.com/keyword Title : Hits : 1 Modified Date : 4/22/2006 8:58:50 PM Expiration Date : 5/3/2006 8:58:52 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : res://C :\PROGRA~1\NORTON~1\ISLA LERT.DLL/SubExpired.htm Title : %s Hits : 1 Modified Date : 4/22/2006 8:50:04 PM Expiration Date : 5/3/2006 8:42:56 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://g.msn.com/0VD0/02/26?m=snl _1439_natalieraps.wmv&csid=3&sd=mbr Title : Hits : 1 Modified Date : 4/22/2006 8:42:14 PM Expiration Date : 5/3/2006 8:35:06 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : outlook://imap.ufl.edu/Inbox

PAGE 132

122 Title : Hits : 2 Modified Date : 4/22/2006 8:40:39 PM Expiration Date : 5/3/2006 8:33:30 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : outlook://imap.ufl.edu/Inbox/Junk Title : Hits : 1 Modified Date : 4/22/2006 8:40:05 PM Expiration Date : 5/3/2006 8:32:58 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : outlook:today Title : Outlook Today Hits : 1 Modified Date : 4/22/2006 8:39:54 PM Expiration Date : 5/3/2006 8:32:46 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : res://wmploc.dll/SamiCaptioning.htm Title : Hits : 1 Modified Date : 4/22/2006 6:21:09 PM Expiration Date : 5/3/2006 6:21:10 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : h ttp://www.cardplayer.com/poker_videos Title : Exclusive poker videos brought to you by CardPlayer.com Hits : 2 Modified Date : 4/22/2006 6:19:29 PM Expiration Date : 5/3/2006 6:12:22 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://cardplayer.com Title : Poker Tournament Ne ws, Tips, Tools, Poker Strategy and Free Online Play

PAGE 133

123 Hits : 1 Modified Date : 4/22/2006 6:19:20 PM Expiration Date : 5/3/2006 6:12:12 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : h ttp://signup.dictionar y.com/wordoftheday Title : Dictionary.com/Word of the Day Mailing List Hits : 1 Modified Date : 4/22/2006 6:18:30 PM Expiration Date : 5/3/2006 6:11:22 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://d ictionary.reference.co m/wordoftheday/list Title : Hits : 1 Modified Date : 4/22/2006 6:18:28 PM Expiration Date : 5/3/2006 6:11:20 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://dictionary.reference.com Title : Dictionary.com Hits : 1 Modified Date : 4/22/2006 6:18:25 PM Expiration Date : 5/3/2006 6:11:16 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : h ttp://www.google.com/s earch?q=earth+day Title : earth day Google Search Hits : 2 Modified Date : 4/22/2006 6:18:14 PM Expiration Date : 5/3/2006 6:11:06 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://www.google.com Title : Google Hits : 1

PAGE 134

124 Modified Date : 4/22/2006 6:18:08 PM Expiration Date : 5/3/2006 6:11:00 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http:/ /www.inlinewarehouse.com/HocSkateTour.html Title : Tour Roller Hockey Skates Hits : 2 Modified Date : 4/22/2006 6:10:19 PM Expiration Date : 5/3/2006 6:03:12 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : h ttp://www.inlinewarehouse.com/hockey.html Title : Inline Hockey Skat es, Hockey Protective Gear, Hockey Sticks, Shafts, Blades, Hockey Wheels, Bearings, Hockey Bags Hits : 2 Modified Date : 4/22/2006 6:09:44 PM Expiration Date : 5/3/2006 6:02:36 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://inli newarehouse.com Title : Inline Warehouse: Aggressive Skates, Hockey Skates, Fitness Skates, wheels, bearings and apparel from Sa lomon, K2, Rollerblade, Mission, Tour, CCM, Razors, Roces, USD, Heelys, Soap, Labe da, Red Star, Senate, Mindgame, Eulogy. Hits : 1 Modified Date : 4/22/2006 6:09:37 PM Expiration Date : 5/3/2006 6:02:30 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http:/ /product-search.ebay.com/CDs_W0QQpovcsZ1226 Title : eBay cds at low prices Hits : 1 Modified Date : 4/22/2006 6:09:22 PM Expiration Date : 5/3/2006 6:02:14 PM User Name : Sean ================== ================== ============== ================== ================== ==============

PAGE 135

125 URL : http://us. ebayobjects.com/6k;h=v5| 33cf|0|0|%2 a|b;10898939;00;0;8417009;13191-275|73;15304726|15322622|1;;~ sscs=%3fhttp://productsearch.ebay.com/CDs_W0QQpovcsZ1226 Title : Hits : 1 Modified Date : 4/22/2006 6:09:18 PM Expiration Date : 5/3/2006 6:02:10 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://www.ebay.com Title : eBay New & used electronics, cars, apparel, collectibles, sporting goods & more at low prices Hits : 2 Modified Date : 4/22/2006 6:09:15 PM Expiration Date : 5/3/2006 6:02:08 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://pokerwire.com/cache/11837.cache.html Title : Poker Wire Poker News Hits : 2 Modified Date : 4/22/2006 6:07:07 PM Expiration Date : 5/3/2006 6:07:08 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://pokerwire.com Title : Poker Wire Poker News Hits : 1 Modified Date : 4/22/2006 6:05:54 PM Expiration Date : 5/3/2006 5:58:46 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://fullcont actpoker.com/pokerjournal.php?subaction=s howfull&id=1145520108&archive= &start_from=&ucat=& Title : Poker J ournal BOOOOOOOORING! by Daniel Negreanu Hits : 2 Modified Date : 4/22/2006 6:05:37 PM Expiration Date : 5/3/2006 5:58:28 PM User Name : Sean

PAGE 136

126 ================== ================== ============== ================== ================== ============== URL : http ://fullcontactpoker. com/poker-journal.php Title : Full Cont act Poker : Your online poker community Hits : 2 Modified Date : 4/22/2006 6:05:29 PM Expiration Date : 5/3/2006 5:58:20 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://fullcontactpoker.com Title : Full Cont act Poker : Your online poker community Hits : 1 Modified Date : 4/22/2006 6:05:22 PM Expiration Date : 5/3/2006 5:58:14 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://imdb.com/title/tt0217505 Title : Gangs of New York (2002) Hits : 2 Modified Date : 4/22/2006 6:03:05 PM Expiration Date : 5/3/2006 5:55:56 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http ://imdb.com/find?s=all&q=gangs+of+new+york Title : IMDb Search Hits : 2 Modified Date : 4/22/2006 6:03:02 PM Expiration Date : 5/3/2006 5:55:54 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://www. amazon.com/exec/obidos/search-handle-url/002-54812793758454?%5Fencoding=UTF8&dym=0&search -type=ss&index=music&fieldkeywords=opeth Title : Amazon.com: Music Search Results: opeth Hits : 1 Modified Date : 4/22/2006 6:01:24 PM Expiration Date : 5/3/2006 5:54:16 PM

PAGE 137

127 User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://imdb.com Title : The Internet Movie Database (IMDb) Hits : 1 Modified Date : 4/22/2006 6:01:22 PM Expiration Date : 5/3/2006 5:54:14 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http:// www.amazon.com/gp/search/ref=br_ss_hs/002-54812793758454?platform=gurupa&url=index%3Dmusic&keywords=opeth Title : Hits : 1 Modified Date : 4/22/2006 6:01:22 PM Expiration Date : 5/3/2006 5:54:14 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http ://www.amazon.com/gp/homepage.html/002-5481279-3758454 Title : Amazon.com: Onlin e Shopping for Electronics, Apparel, Computers, Books, DVDs & more Hits : 1 Modified Date : 4/22/2006 5:59:47 PM Expiration Date : 5/3/2006 5:59:48 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http:// www.opethforum.com/forum/showthread.php?t=19 Title : The Now Eating/Drinking Thread opethforum.com Hits : 2 Modified Date : 4/22/2006 5:59:36 PM Expiration Date : 5/3/2006 5:59:38 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://www.opethforum.com/forum/forumdisplay.php?s=aa770411d8ae6144062081c96d ceda09&f=4 Title : off-topic opethforum.com

PAGE 138

128 Hits : 2 Modified Date : 4/22/2006 5:59:13 PM Expiration Date : 5/3/2006 5:52:04 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : h ttp://www.opethforum.com/forum/index.php Title : opethforum.com Powered by vBulletin Hits : 1 Modified Date : 4/22/2006 5:59:04 PM Expiration Date : 5/3/2006 5:51:56 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://opethforum.com Title : unofficial opeth forum Hits : 1 Modified Date : 4/22/2006 5:59:01 PM Expiration Date : 5/3/2006 5:51:52 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://www.ufl.edu Title : University of Florida Hits : 2 Modified Date : 4/22/2006 5:58:52 PM Expiration Date : 5/3/2006 5:51:44 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://www.isis.ufl.edu Title : ISIS Hits : 4 Modified Date : 4/22/2006 5:58:21 PM Expiration Date : 5/3/2006 5:51:12 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http ://hockeygiant.com/hockey-gloves.html Title : Hits : 1

PAGE 139

129 Modified Date : 4/22/2006 5:57:27 PM Expiration Date : 5/3/2006 5:50:20 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http:// hockeygiant.com Title : Hockey Equipment from Hockey Giant Hockey Stick, Hockey Equipment, Ice Hockey Skates, Inline Hock ey Skates, Bauer Roller Hockey Skates, Goalie Equipment, NHL Jersey, Bauer Hockey Hits : 2 Modified Date : 4/22/2006 5:57:01 PM Expiration Date : 5/3/2006 5:49:54 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : h ttp://extempore.livejournal.com/143890.html Title : Paul Phillips thought for the day Hits : 2 Modified Date : 4/22/2006 5:56:55 PM Expiration Date : 5/3/2006 5:49:48 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://extempore.livejournal.com Title : Paul Phillips Hits : 1 Modified Date : 4/22/2006 5:56:15 PM Expiration Date : 5/3/2006 5:49:08 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://money central.msn.com/msn/stock_quote?Symbol=dis&a=0 Title : MSN Money dis Stock quote, Mutual fund quote, ETF quote, Index quote, Options quote Hits : 1 Modified Date : 4/22/2006 5:56:02 PM Expiration Date : 5/3/2006 5:48:54 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http ://g.msn.com/0USHP/28?Symbol=dis&a=0

PAGE 140

130 Title : Hits : 1 Modified Date : 4/22/2006 5:55:58 PM Expiration Date : 5/3/2006 5:48:50 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://www.msn.com Title : MSN.com Hits : 1 Modified Date : 4/22/2006 5:55:56 PM Expiration Date : 5/3/2006 5:48:48 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://www.metalnyheter.se/metalgodi s/bildgalleri/2006/c loseup_2006/060416_opeth/06 0416_opeth.htm Title : Metalgodis Hits : 1 Modified Date : 4/22/2006 5:55:33 PM Expiration Date : 5/3/2006 5:48:24 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://opeth.com Title : Opeth The official site Hits : 1 Modified Date : 4/22/2006 5:55:23 PM Expiration Date : 5/3/2006 5:48:16 PM User Name : Sean ================== ================== ============== ================== ================== ============== URL : http://www.nhl.com Title : NHL.com The National Hockey League Web Site Hits : 1 Modified Date : 4/22/2006 5:54:56 PM Expiration Date : 5/3/2006 5:47:48 PM User Name : Sean

PAGE 141

131 Appendix C2: Word File Keywords Section (Each occurrence of uniquepattern signifies the beginning of a new html file) uniquepattern uniquepattern uniquepattern uniquepattern uniquepattern uniquepattern uniquepattern uniquepattern uniquepattern the poker authority: card pl ayer is your source for poker reporting, articles, wsop and wpt updates, tools, odds cal culators, and poker pros. poker news, tips, strategy, hold'em, texas hol dem, free online poker, world series of poker, world poker tour, tournaments, poker rules, articles, how to play poker, cardplayer.com, online poker rooms uniquepattern dictionary.com word of the day mailing list word of the day mailing list free vocabulary builder sat verbal. uniquepattern dictionary.com word of the day mailing list word of the day mailing list free vocabulary builder sat verbal. uniquepattern free online english dictionary, thesaurus a nd reference guide, crossword puzzles and other word games, online translator and word of the day. online dictionary dictionaries glossary gl ossaries thesaurus reference definitions Webster's revised unabridged dictionary the am erican heritage dictionary of the english language english spanish french german ita lian latin greek sanskrit foreign languages learning linguistics word of the day fun word games recreation crossword puzzles faq frequently asked questions spelling sp ellcheck spellchecker translator uniquepattern

PAGE 142

132 uniquepattern uniquepattern uniquepattern uniquepattern inlinewarehouse.com,inline wa rehouse,skates,inline skat es,hockey,hockey skates,hockey inline skates,fitness,fitness skates,agg ressive skates,street skates,protective gear,wheels,bearings,senate,senate sk ates,roces, roces skates,salomon,salomon skates,k2,k2 skates,mission,mission skates,tour ,tour skates,razor,jofa,koho,labeda,labeda wheels,red star,easton,bauer,itech,protec,harbinger,ccm">
Permanent Link: http://ufdc.ufl.edu/UFE0015400/00001

Material Information

Title: Scheduling Online Advertisements Using Information Retrieval and Neural Network/Genetic Algorithm Based Metaheuristics
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0015400:00001

Permanent Link: http://ufdc.ufl.edu/UFE0015400/00001

Material Information

Title: Scheduling Online Advertisements Using Information Retrieval and Neural Network/Genetic Algorithm Based Metaheuristics
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0015400:00001


This item has the following downloads:


Full Text











SCHEDULING ONLINE ADVERTISEMENTS USING INFORMATION RETRIEVAL
AND NEURAL NETWORK/GENETIC ALGORITHM BASED METAHEURISTICS














By

JASON DEANE


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA


2006

































Copyright 2006

by

Jason Deane


































This document is dedicated to my entire family, but especially to Amanda, Conner, mom
and pawpaw. Amanda and Conner provided me with the necessary strength,
determination and never ending support and were my inspiration in pursuing and
finishing my PhD. Mom and pawpaw are, without a doubt, the two most influential
people in my life. For good and for bad, everything that I am is as a result of my never
ending effort to model myself after these two amazing people. Pawpaw was the kindest
and most sincere person that I have ever met and although he's in a better place now, I
still think of him every day. My mother is the strongest and hardest working person that I
know and without her many sacrifices, my life could have been completely different and
I would never have had the opportunity to achieve this goal. Thank you!
















ACKNOWLEDGMENTS

I would like to especially thank my wife Amanda for supporting and putting up

with me throughout this process. I know that it was not easy. I would also like to thank

our families for their never ending support throughout this very challenging endeavor. In

addition, I would like to thank my dissertation committee and the DIS department staff

for their support and guidance. In particular I would like to thank and acknowledge my

advisor, Anurag Agarwal, and my co-chair, Praveen Pathak, for their countless hours of

training and support. I couldn't have done it without you!!i




















TABLE OF CONTENTS


page

ACKNOWLEDGMENT S .............. .................... iv


LIST OF TABLES ............ ...... .__ ..............vii..


LI ST OF FIGURE S .............. .................... ix


AB S TRAC T ......_ ................. ............_........x


CHAPTER


1 INTRODUCTION AND MOTIVATION ................. ...............1................


2 ONLINE ADVERTISING ................. ...............5............ ....


2. 1 Definitions and Pricing Models ................. ...............5...............
2.2 Literature Review .............. ...............7.....


3 INFORMATION RETRIEVAL METHODOLOGIES ................. ......................27


3 .1 Overvi ew ................. ...............27........... ...
3.2 Data Pre-processing ................. ...............30................
3.3 Vector Space M odel .............. ...............3 1....
3.4 Structural Representation............... .............3
3.5 W ordNet .............. ...............35....


4 LARGE SCALE SEARCH METHODOLOGIES .............. ...............38....


4. 1 Overvi ew ................. ...............3.. 8......... ...
4.2 Genetic Al gorithms............... ...............4
4.3 Neural Networks ................. ...............47........... ...
4.4 The No Free Lunch Theorem .............. ...............52....


5 RESEARCH MODEL(S) .............. ...............54....


5.1 Problem Summary ............... ...... .. .. .............5
5.2 Information Retrieval Based Ad Targeting .............. ...............56....
5.3 Online Advertisement Scheduling ................... .......... .......... .........6
5.3.1 The Modified Maxspace Problem (MMS) .............. ............ ...............6
5.3.2 The Modified Maxspace Problem with Ad Targeting (MMSwAT) ..........68











5.3.3 The Modified Maxspace Problem with Non-Linear Pricing
(M M SwNLP) .............. ...............72....
5.4 M odel Solution Approaches .......................... ..............7
5.4.1 Augmented Neural Network (AugNN) .............. ...............74....
5.4.2 Genetic Al gorithm (GA)............... ...............78..
5.4.3 Hybrid Technique ................. ...............82................
5.4.4 Parameter Selection ................. ...............83................
5.5 Problem set Development ................. ...............84...............

6 RE SULT S .............. ...............86....

6.1 Information Retrieval Based Ad Targeting Results............... .... ..... ...........8
6.2 Discussion of the Information Retrieval Based Ad Targeting Results ...............101
6.3 Online Advertisement Scheduling Results ................ ............................102
6.2. 1 Modified Maxspace (MMS) Problem Result.................... .................. .....104
6.2.2 The Modified Maxspace with Ad Targeting (MMSwAT) Problem
R e sults................... .......... ... ............. .... ......................10
6.2.3 The Modified Maxspace wint Nonlinear Pricing (MMSwNLP) Problem
R e sults.............. ..... ......_ ... ... .._ ...... ... .......... 0
6.3 Discussion of the Online Advertisement Scheduling Results ..........................109

7 SUMMARY, CONCLUSIONS AND FUTURE RESEARCH. ............... ... ............110

APPENDIX

A GA AND AUGNN PARAMETER AND SETTING DEFINITIONS ................... ..113

B LIST OF ADVERTISED PRODUCTS AND SERVICES AND THEIR
RESPECTIVE CHARACTERISTIC ARRAYS ................. ......... ................114

C SAMPLE DOCUMENTS FOR ONE USER FROM THE IR BASED AD
TARGETING PROCE SS ................. ...............119......... ......

LI ST OF REFERENCE S ................. ...............149................

BIOGRAPHICAL SKETCH ................. ...............159......... ......




















LIST OF TABLES


Table pg


1 Structural Element Weighting Schemes ................. ...............60........... ...


2 AugNN Parameter Values ................. ...............84................

3 GA Parameter Values ................. ...............84................


4 Hybrid Parameter Values .............. ...............84....

5 Summary of Mean Student Rankings for the 4 Selection Methods ................... ......88

6 T Test-Scheme 1 & Random Selection ................ ...............89........... ..


7 T Test-Scheme 2 & Random Selection ................ ...............89........... ..


8 T Test-Scheme 3 & Random Selection ................ ...............90........... ..


9 Summary of Mean Student Rankings for the Three Weighting Schemes. ...............91

10 T Test-Scheme 1-5 & Scheme 1-1 .............. ...............92....


11 T Test-Scheme 1-5 & Scheme 1-2 .............. ...............92....


12 T Test-Scheme 1-5 & Scheme 1-3 .............. ...............93....


13 T Test-Scheme 1-5 & Scheme 1-4 .............. ...............93....


14 T Test-Scheme 2-5 & Scheme 2-1 .............. ...............94....


15 T Test-Scheme 2-5 & Scheme 2-2 .............. ...............94....


16 T Test-Scheme 2-5 & Scheme 2-3 .............. ...............95....


17 T Test-Scheme 2-5 & Scheme 2-4 .............. ...............95....


18 T Test-Scheme 3-5 & Scheme 3-1 .............. ...............96....


19 T Test-Scheme 3-5 & Scheme 3-2 .............. ...............96....


20 T Test-Scheme 3-5 & Scheme 3-3 .............. ...............97....












21 T Test-Scheme 3-5 & Scheme 3-4 .............. ...............97....


22 T Test-Scheme 1 & Scheme 2............... ...............99...


23 T Test-Scheme 1 & Scheme 3 .............. ...............99....


24 T Test-Scheme 2 & Scheme 3 ................ ...............100........... ..


25 Problem Results............... ...............104


26 MMSwAT Comparison of Results ................. ...............105..............


27 MMSwNLP Comparison of Results ............. ....._ ....__ ............0


























































V111


















LIST OF FIGURES


Figure pg

1 A Screen Print of Yahoo' s Shopping Page. Notice the advertising banner down
the right hand side of the Web page ................. ...............10..............

2 Pictorial Representation of Information Flow in Traditional Print Advertising ......18

3 Pictorial Representation of the Information Flow in Online Advertising ................ 19

4 Geometric Representation of the VSM ................ ...............32...............

5 Classes of Search Methods (Basic Model Borrowed from [54]) .............................40

6 Pictorial Representation of the Cerebral Cortex [91].............. ..................4

7 Pictorial Representation of a Basic Feed Forward ANN [91] ................. ...............50

8 Selected Parents Prior to Crossover .............. ...............80....

9 Resulting Offspring ................. ...............8.. 1......... ...

10 Child 2 Prior to M utation .............. ...............8 1....

11 Child 2 After Mutation ................. ...............81........... ...

12 Q-Q Plot of Student Response Values .............. ...............87....
















Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

SCHEDULING ONLINE ADVERTISEMENTS USING INFORMATION RETRIEVAL
AND NEURAL NETWORK/GENETIC ALGORITHM BASED METAHEURISTICS

By

Jason Deane

August, 2006

Chair: Anurag Agarwal
Cochair: Praveen Pathak
Major Department: Decision and Information Sciences

As a result of the recent technological proliferation, online advertising has become

a very powerful and popular method of marketing; industry revenue is growing at a

record pace. One very challenging problem which is faced by those on the publishing

side of the industry is ad targeting. In an attempt to maximize revenue, publishers try

their best to expose web surfers to a set of advertisements which are closely aligned with

their interests and needs. In this work, we present and test an information retrieval based

ad targeting technique which shows promise as an alternative solution method for this

problem. A second, very difficult, challenge faced by online ad publishers is the

development of an ad schedule which makes the most efficient use of their available

advertisement space. We introduce three versions of this very difficult problem and test

several potential solution techniques for each of them.















CHAPTER 1
INTRODUCTION AND MOTIVATION

Despite residual fears from the dot-com decline of 2000, many seem to be once

again embracing the Web. Worldwide Internet usage is at an all time high, broadband

access is soaring and many households are turning away from their televisions in lieu of

their computer screens [1]. The proliferation of the fiber optic telecommunication

infrastructure which was left over from the telecom boom of the 1990's has made

broadband connectivity accessible and affordable for almost any family. As a result, the

online experience has been vastly improved and is extremely popular with the technology

generation. According to Tom Hyland, Partner and New Media Group Chair,

PricewaterhouseCoopers, this has created a mass audience of Internet users which simply

cannot be ignored by advertisers. Corporate America is beginning to realize the potential

importance of expanding its advertising portfolio to include the online channel. This

sentiment is echoed by many corporate executives. David Garrity, a financial analyst at

Cars & Company, a Wall Street investment firm, asserts that "Every indication is that

corporate advertising budgets are increasingly allocated to the Internet" [2, p.1]. Ty

Montague, Wieden & Kennedy's chief creative officer, believes "Whereas people are

zapping most TV advertising, the Net is amazing for drawing people in, if our ingenuity

is up to it" [3, p.1]. These comments are typical of the current claims about the growth of

the online advertising channel.

The recent trend in online advertisement spending fully supports these claims.

According to a recent Price Waterhouse Coopers report, industry revenue for the calendar










year 2005 totaled $12.5 billion which represents a 23% year over year increase in

comparison with 2004 results [4]. Industry wide revenue has increased in 12 of the last

13 quarters. In addition, future projections of widespread mobile Internet access demand

are expected to provide an additional revenue boost for the industry. It is estimated that

online advertisement revenues for the US alone will grow to $18.9 billion by 2010 [4].

Motivated by this upward sloping trend in Internet advertising demand, many companies

(e.g., Google, Yahoo, AOL, etc.) have adopted a business model which is heavily

dependent upon the revenue stream generated from their publishing of online

advertisements [5]. As a result, efforts to improve the online advertisement scheduling

process are under extreme demand.

In personal conversations with Doron Welsey, Director of Industry Research for

the Interactive Advertising Bureau (IAB), and Rick Bruner, research analyst for

DoubleClick, a leading online advertising agency, both indicated that they have been

inundated with companies seeking help with their online advertising efforts. They also

indicated that research which helps overcome the IT-related challenges that currently face

the industry is critically needed and therefore is likely to be important to industry experts

and academicians alike. In this dissertation, we apply information retrieval and artificial

intelligence methodologies in an attempt to provide efficient, appealing solution

alternatives to one of the most difficult and compelling problems facing publishers,

online banner advertisement scheduling. Given the popularity of banner advertising and

the considerable revenue which it generates, even a small improvement in the efficiency

and/or quality of the scheduling process could result in a considerable increase in

revenue.










The goals of this thesis are three fold. First, we propose a methodology which,

based on a user' s recent Web surfing behavior, provides an estimate of his or her level of

interest in a particular advertisement. Second, we introduce three new real-world

variations of the strongly NP-hard online advertisement scheduling optimization problem.

Finally, we develop and test several heuristic and meta-heuristic solution algorithms for

each of the new models that we propose.

Information Retrieval (IR) is an area of research which attempts to extract usable

information from textual data. We propose a method by which Information retrieval and

ontological methodologies are utilized to exploit a user' s recent Web surfing history in an

effort to categorize ads based on the user' s predicted level of interest. IR has historically

been employed in the Hield of library sciences, but it has recently gained favor in many

other Hields including Internet search and cyber security. The power of IR is its ability to

handle textual information. Information retrieval has been applied in many domains,

including document sorting, document retrieval, inference development and query

response. We use IR techniques to leverage the textual representation of a user' s html

Web surfing history in the creation of a weighted characteristic array for each user. We

create similar arrays for each advertisement and use several similarity measures to

strategically create a schedule of user-advertisement assignments.

The basic online advertisement scheduling optimization problem has been

addressed in the literature. Because it is an NP-hard problem [6], most of the variations

have been limited to linear pricing models which seek to maximize the number of ads

served or the number of times an ad is clicked. We introduce several new model









variations designed to address realistic issues such as nonlinear pricing and advertisement

targeting.

Obviously, the NP-hard nature of the basic linear problem means that these

variations will be even more difficult to solve optimally. We develop and test several

heuristic algorithms which may allow efficient generation of near-optimal solutions for

these models. Machine learning (ML) is the study of computer algorithms that improve

automatically through experience [7]. Machine learning is a subset of artificial

intelligence which has received considerable attention due to the recent increase in

available computing power. Machine learning methods such as decision trees, logit

functions, and neural networks have been applied successfully to a wide array of

problems, including optimization problems, and have therefore proven to be valuable

tools in the development of heuristic solution approaches. We combine neural network

and genetic algorithm techniques with several base heuristics in an effort to provide

efficient robust solution techniques for multiple variations of the online advertisement

scheduling problem.















CHAPTER 2
ONLINE ADVERTISING

This chapter presents a general overview of the online advertising industry and the

associated research. In section 2. 1, we provide a review of the basic definitions and

pricing models. In section 2.2, we provide a review of the online advertisement

scheduling literature.

2.1 Definitions and Pricing Models

There are three primary participants in online advertising. At the top of the chain is

the advertiser. This is a company that enters into an agreement with a publisher in order

to enlist the publisher' s assistance in the serving of their online advertisements. More

times than not, the ads are delivered to users of the publisher' s Web pages. The publisher

is a company that expends resources in an effort to publish online advertisements in an

effort to generate revenue. The customer is the individual who browses Web pages and

may or may not respond to an ad in a manner that is verifiable, such as clicking the ad.

Publishers could be paid by the advertisers for their service according to a number

of possible schemes. The first category of pricing models is often referred to as

Impression Ba;sedPricing2\~odels because the publisher is paid entirely on the basis of

serving the ad, which is called an impression on the Webpage, and not on any action

taken by the customer. Thus, the publisher is paid whether or not the customer shows

any interest in the ad. The most basic impression based model is CPM- Linear Pricing.

CPM is short for cost per mille (mille is Latin for 1,000). In this scheme, the publisher is

paid a Eixed fee for each 1,000 ads that are served. The fee is based on the size of the ad









and increases in a linear fashion. In addition, the rate may be different depending on the

chosen Web pages (sports, news, etc.), the time of day, etc.; however, many publishers

price each slot identically in an effort to simplify the accounting and scheduling

operations. Larger ads decrease the publishers' flexibility to schedule ads within a fixed

banner area; therefore, publishers might expect a premium for larger ads. CPM-

Nonlinear Pricing allows for this expectation. It is the same as the CPM- Linear

Pricing except that the pricing function with respect to advertisement size is either a

concave or a step function instead of a linear function.

The third type of CPM model is called M~odified CPM. This is a model which is

being used by publishers in an effort to increase the revenue which they receive for the

advertising space on their generic/non-targeted Web pages. Advertising space for the

targeted pages such as sports, automotive, and real estate is in high demand; however, the

space on the other non-targeted pages is much harder to sell. As a result, publishers have

started trying to charge a premium for the advertising space on these non-targeted pages

by employing consumer classification. The basic idea is that a user is classified based on

his or her click behavior and then served ads based on this classification. As an example,

a user that visits the sports page more than some threshold number of times is classified

as a sports person. The publisher then targets this consumer when he or she visits one of

the non-targeted pages and serves the consumer a sports related ad. The revenue that the

publisher is able to demand in this situation is not as high as it would have been had he or

she served the ad on the sports page, but it is higher than he or she could have received

for another random ad placement on the non-targeted page. Under all of the CPM based

models, the advertiser bears all of the financial risk. This is because the advertiser must










pay the publisher the agreed upon rate regardless of how well the advertisement

campaign performs.

The other primary category of pricing models, in contrast to impression-based

models, is Performance Ba~sed Models. These are models within which the publisher is

paid based solely on some pre-defined measure of ad campaign performance.

Performance Ba~sed CPC (Cost Per Click) is a scheme in which the publisher is paid a fee

each time the advertiser' s ad is clicked. Performance Ba~sed CPS (Cost Per Sale) is

where the publisher is paid a fee each time one of the served ads results in a sale. In

Performance Ba~sed CPR (Cost per Registration), the publisher is paid a fee each time a

consumer sets up an account with the advertiser as a result of the advertisement. Under

all of the performance based models, the publisher bears all of the financial risk. This is

because the publisher is paid nothing for simply publishing the advertisements. Instead,

he or she is paid only if pre-defined performance criteria are met.

Finally, Hybrid Pricing M~odels are pricing models which combine two or more of

the above models. Often, this type of model will include the CPM and one or more of the

performance based models in an effort to establish an equitable risk sharing situation

between the publisher and the advertiser. These have become very popular in industry.

2.2 Literature Review

The process of scheduling online advertisements can be a very challenging and

dynamic task which is characterized by a wide array of obstacles and constraints. The set

of constraints and difficulties differs vastly from publisher to publisher, depending on

their effort level and the methods with which they choose to address the problem.

Factors which affect the relative complexity level of the problem include which pricing

models the publisher chooses; which, if any, targeting efforts the publisher attempts to










employ, and which additional artificial intelligence techniques the publisher chooses to

build into their scheduling algorithms. Thus far, the primary focus of academic

researchers has been on addressing the most basic of these situations: a CPM pricing

model with no applied intelligence or targeting.

From a publisher' s perspective, advertising space is a precious non-renewable

resource which, if used efficiently, can drive both current revenues and future demand.

The two primary goals, from an advertisement scheduling perspective, are to minimize

the amount of unused advertising space and to maximize the probability that a customer

will have interest in the advertisements which he or she is served. Depending on the

agreed upon pricing model, these goals can take on different levels of importance with

respect to the maximization of revenue. Publishers are currently compensated based on

some combination of the amount of Web space and number of advertisement impressions

delivered for an advertiser, and/or their score on a pre-defined set of performance based

measures such as the number of clicks, number of leads, or the number of sales. The

original pricing model for the online advertising industry was the CPM model. The CPM

model is a basic pricing structure which was adopted from traditional print advertising,

within which the publisher is compensated an agreed upon rate for every thousand

advertisement impressions that they deliver. This model was very popular in the 1990's

and is still being used by many companies [8]. Thus far academic research literature has

been primarily focused on models which are based on this pricing strategy.

The seminal online advertisement scheduling paper by Adler, Gibbons and Matias

[6] introduced two basic problems, the Minspace and the Maxspace problem, and proved

that both are NP-hard in the strong sense. The Maxspace problem is formulated based on









the CPM pricing model. The objective of the Maxspace problem is to find a feasible

schedule of ads such that the total occupied slot space is maximized given that the slots

have a fixed capacity and the ads are of differing sizes and differing display frequencies.

There are several assumptions which are inherent in the formulation of these two models.

First, it is assumed that each banner/time slot is the same size, S. Second, it is assumed

that all of the ads have the same width, which is equal to the width of the banner. This is

common practice when the banner ad space is found on either side of a Web page. See

figure 1 for an example from Yahoo's shopping page. Next, it is assumed that each ad

has a height which is less than or equal to the height of the banner. It is also assumed that

any user who accesses the Web site during a given time slot will see the same set of

advertisements. In addition, the authors assume that there is a positive linear relationship

between advertisement size and the revenue which is generated. Therefore, the obj ective

is to find a feasible set of ads which maximizes the used advertising space.

An IP formulation of the Maxspace problem is as follows:


Mr ax six
]=1 =1-
st.
Constraints

(1) i[ sx~ < S, j =1, 2,.., N


(2) fx r7, = wyi1, 2,..., n

1I if Ad i is assigned to ad slot j~
(3) x,, = f M u

1I if ad i is assigned `
(4) y, =1Oof eM u






































I


Where :


n total number of advertisements available for scheduling over the


planning period

N total number of available time slots in the planning period

S Banner height


si height of advertisement i, i = 1, 2,..., n

wi display frequency of advertisements i, i = 1, 2,..., n


HL Edt vesY Eneratcr (+6 nep


-clo~


Cam Mm suer
asn~~~.~ I
Coored Carth a ma
can N -t 15m







Heabllh Prsoa ae

F tali tatcul uu
mSports & nean, p


nalfh P anrs ( lr
M uskm


Figure 1. A Screen Print of Yahoo's Shopping Page. Notice the advertising banner
down the right hand side of the Web page.


Find the perfect grft.

et5 p~ 1 Pickl


Inu~F Mu*aae l


P~Il~e~ man MuhaChi



Pnto $199009
Furlnis wll Flar: OverTI 70 ~rhefmAt
aletions la match any alyla. Shop
TargeLearn


Top 5 Poplar Laptops


Del Insplir6ar K) La~tp DD Grniler
User RetIng 4 frf


8


eht occanonom


O
OuBr)lOCL C9m-









Constraint (1) ensures that the combined height of the set of ads which are

scheduled for each banner slot does not exceed the available space. Another assumption

of the model is that if an advertisement is chosen, the number of delivered impressions

for that ad must exactly equal its pre-defined frequency, wl Constraints (2 & 4)

combine to ensure that this relationship is guaranteed. Constraint (3) ensures that at most

one copy of each ad can appear in any given slot. In other words, it is not acceptable to

schedule the same ad multiple times in a given banner slot. This constraint represents a

very important aspect of the online advertisement scheduling problem which

distinguishes it from other related bin packing and scheduling problems.

The Minspace problem is very similar to the Maxspace problem. However, there

are a couple of significant differences. An IP formulation of the Minspace problem is as

follows:

M~in S
st.
Cons traints

(1) i[ s~x2,

(2) f x, = u,72, i =1, 2,..., n

(3) x~ = r1 if Ad i is assigned to ad slot jal::,,, ,. .,1


(4) y~ r1 if ad i is assignedL~ s ~"i""

W here:
S = size of the slots
s~ = size of ad i, i EN
wl = frequency of ad i









One primary difference is that this problem does not assume that the size of the

banner/time slot is fixed. Instead, the objective of this problem is to schedule all of the

ads while minimizing the height of the tallest slot. The authors postulate that this

problem may be useful during the Website design phase. For this problem, they

developed the Largest Size Least Full (LSLF) algorithm, which is a 2-approximation and

they developed a Subset-LSLF algorithm for the Maxspace problem. The LSLF


algorithm, which can be implemented in time O(ijw, + sor~t(A)), is a basic greedy


heuristic. The steps are detailed below.

Largest Size Lea~st Full (LSLF) Algorithm
Sort the ads in descending order of size

Assign each of the ads in sorted order, ad i is assigned to the wl least full
slots.
The Subset-LSLF algorithm is very similar. The steps are as follows.
Subset Largest Size Lea~st Full Al glo within

Classify the ads into two subsets based on their relative size. If s, = S,
the ad is placed in subset Bs otherwise, it is placed in subset Bk '
Calculate the volume of advertisements for each subset = syw,

Choose the subset with the largest volume. Assign the ads from this
subset as long as there is sufficient space available. For the Bk Subset, use
the LSLF algorithm for placement.
The authors show that this is a 2-approximation algorithm for the special case

where ad widths are divisible and the profit of each ad is proportional to its volume

(width times display frequency). One limitation of their work is that nearly all of their

meaningful results pertain to this special case where the ad sizes are divisible.

Dawande, Kumar and Sriskandarajah [5] propose additional heuristic solution

techniques for both the Maxspace and the Minspace problems. For the Minspace

problem, they suggest a linear programming relaxation (LPR) based algorithm and a









Largest Frequency Least Full (LFLF) heuristic. The authors prove that the LPR is a 2-

approximation algorithm and that this bound is asymptotically tight. In addition, they

prove that the integrality gap for the algorithm is bounded above by smax and that the

time complexity is O(n3 3L +N(n + N-_1) where L is the length of the binary encoding

of LPmm. The LFLF heuristic is very similar to the LSLF heuristic designed by Adler et

al. [6] except that the ads are assigned in the non-increasing order of their frequency

instead of non-increasing order of their size. The time complexity of the algorithm is

O(nlogn + nNlogN) which is comparable to that of LSLF; however, the performance

bounds are slightly better. The performance bound for the LFLF algorithm for the

Minspace problem is:

f(solution of LFLF) / f (optimum solution val of IP) > r where
1. r = 2 -1/(N- wo +1l)when2wo <: (N +1)/ 2
2. r = max {l, (2N /(N + 2))((S* so) / S* }when wo > (N + 1) / 2
wo = mintw,,...,w }
so = min {s,,..., s }

S = 1 s,

The bound is tight in both cases. The authors also introduce two heuristics, MAX1

and MAX2, for the Maxspace problem. These algorithms involve a decomposition of the

set of ads into two subsets based on their frequencies. Based on the total weight of ads in

each subset, the algorithm gives priority to one of the subsets. All of the ads from that

subset are assigned using the LSLF heuristic and then the other subset of ads is assigned

likewise. Maxl has a time complexity of O(nlogn+nNlogN) and a performance bound

f /f* > 1/4+1/4N. Max2, which is a little more complicated, has a time complexity of

O(n3N+nNlOgN) and a performance bound f /f* > 3/10 The authors tested their









heuristics against a test bed of problems. They created 10 sets of problems, with each set

containing 10 problems. The number of slots, N, ranged from [25, 100] and the ad sizes

ranged from [S/3, 2S/3]. One important contribution from their work is that they remove

the restrictive limitations on advertisement size which were present in the work by Adler

et al. [6]. The average percentage gap between the heuristic and optimum solutions for

the LFLF, Maxl and Max2 heuristics were approximately 30%, 15% and 20%

respectively.

Freund and Naor [9] propose additional heuristic-based solution techniques for the

Maxspace and the Minspace problems. Following the trend set by Dawande et al. [5],

this work also allows arbitrary ad sizes, but maintains all of the other assumptions

originally set out by Adler et al. For the Minspace problem, they propose the Smallest

Size Least Full (SSLF) heuristic. Their method is very similar to that of Adler et al.;

however, their heuristic considers the ads for placement iteratively in non-decreasing

order of size which is the exact opposite of the procedure proposed by Adler et al. For

the Maxspace problem, they propose a (3+E) approximation algorithm which combines a

knapsack relaxation and the SSLF heuristic. In addition, they also provide solution

techniques for two special cases; ad widths not exceeding one half of the display area,

and each advertisement must either occupy the entire area or else have a width at most

one half of the display area. No test results are provided for any of the proposed

algorithms.

Menon and Amiri [10] propose and test Lagrangean relaxation and column

generation solution techniques for a variation of the Maxspace problem. One maj or

difference in their work is that they relax the advertisement frequency constraint. Instead









of requiring each ad to appear a pre-defined number of times, they set an upper bound for

the number of times that each ad can appear. In their explanation, the authors make a

concerted effort to distinguish the scheduling horizon from the planning horizon. The

scheduling horizon corresponds to the length of time within which the publisher commits

to deliver a set number of advertisement impressions for their consumers, where as the

planning horizon is the period of time for which we are trying to schedule a set of ads to

fill the banner space. They claim that the planning horizon should be shorter than the

scheduling horizon in order to provide scheduling flexibility for the publisher. According

to the authors, if these horizons are of unequal length as they recommend, the proposed

upper bound on the ad frequency should correspond with the number of ad impressions

left to be delivered for a given advertiser during the scheduling horizon. For example,

assume that our planning horizon is one week and that the problem at hand is to develop

an advertisement schedule for the third week of September. Let us also assume that we

have promised Dell that we will deliver 1000 impressions of their ad during the month of

September and thus far we have only delivered 100. In this situation, the upper bound for

the number of times that Dell's ad could be scheduled during the planning horizon (the

third week of September) is 900; however, there is no minimum requirement. The

proposed relaxation definitely provides additional flexibility and in doing so simplifies

the complexity of the problem considerably. However, it also creates another set of

potential problems. In the hypothetical example above, with the model formulation

provided by the authors, there is no way to guarantee that Dell's 1000 impressions would

be delivered within the agreed upon time frame. For obvious reasons, this is probably not

a desirable situation. The authors make a very compelling argument for their version of









the Maxspace problem, and it is likely that there are business situations within which

their model would be extremely useful. However, the discussed limitation should be

carefully considered prior to its application. To test their heuristics, the authors created a

large data set which consisted of 1500 problems. The number of advertisements and the

number of time slots ranged from [20, 200], and the size of the banner ranged from [40,

100]. In addition, the authors varied the selection process of the wl values, choosing

from several different uniform distributions. They applied the column generation and the

Lagrangean procedures to the entire data set. In addition, they combined the column

generation procedure with a greedy based preprocessing heuristic and tested it against the

entire data set. Their testing sequence indicated that the column generation procedure

performs much better than the lagrangean relaxation procedure against their data set and

that the initialization heuristic only enhanced column generation procedure's dominance.

Dawande, Kumar and Sriskandaraj ah [1 1] propose three improved heuristic

solution techniques for the Minspace problem. These solutions represent slightly better

performance bounds than those presented in their earlier work. They introduce

algorithms Min1 and Min2 which are both slight adaptations of their linear programming

relaxation solution (LPR) from their earlier work [12]. Each algorithm involves running

the LPR heuristic iteratively with contrasting stopping criteria. Min1 has a time

complexity of O(n N L) and a performance bound of 1+(1/ Z). Min2 offers a slightly

better performance bound, 3/2, but pays a cost in the increased time complexity,

O(n N L) In addition, they offer a heuristic solution for the online version of the

Minspace problem. This version requires that decisions concerning the scheduling of

individual ads be made without prior knowledge about the ads which will arrive in the









future. They recommend the First Come Least Full (FCLF) heuristic which schedules

each ad, assuming that there is sufficient space, as it arrives in the least full time slots.

This algorithm has a performance bound of 2-(1/N). The authors do not test their

heuristics.

Kumar, Jacob and Sriskanadaraj ah [13] developed and tested three new techniques

for the standard Maxspace problem. First, they proposed the Largest Size Most Full

(LSMF) heuristic, which is based on the Multifit algorithm that was developed by

Coffman, Garey and Johnson [14] as a solution technique for the classical bin packing

problem. This algorithm first Einds the maximum slot fullness and then removes ads until

a feasible schedule is achieved. The ads are removed based on their relative volume

(wlsl) in non-decreasing order. The authors point out that as the problem size grows in

terms of the number of time slots, N and the number of advertisements, n to a size that

is comparable to that which is experienced in industry, the basic heuristics that have been

proposed are not very efficient. Therefore, they turn to the world of meta-heuristics,

proposing a Genetic Algorithm (GA) based solution technique. GAs are directed global

search meta-heuristics which are based on the process of natural selection. GA based

solutions, in many cases, are extremely successful when applied to global optimization

problems. For a more in depth review, please see chapter 4. Lastly, they propose a

hybrid algorithm which combines the GA meta-heuristic with the LSMF and SUBSET-

LSLF base heuristics. The authors test each of their proposed algorithms and the

SUBSET-LSLF algorithm developed by Adler, Gibbons and Matias [6] against two

randomly generated data sets. The first data set consists of 40 small problems and the

second consists of 150 large problems. The number of time slots for the smaller










problems ranges from [5, 10]; for the larger problems the range was from [10, 100]. It

should be noted that CPLEX was unable to provide an optimal solution for any of the

larger problems in reasonable time. As anticipated, although their time requirements

were a little more demanding, the meta-heuristic and hybrid models performed extremely

well, dominating the performance of the heuristics for both data sets. The hybrid model

was the clear winner in terms of solution quality.

In its infancy, the industry embraced the CPM pricing model and used it relatively

effectively. However, over time many stakeholders recognized one primary difference

between online advertising and print advertising which motivated a move to new, more

equitable pricing models; a difference in the flow of information. In traditional print

media, information primarily flows in only one direction as described pictorially below.


Advertiser ~ UIPublisher ~ UICustomer



Figure 2: Pictorial Representation of Information Flow in Traditional Print Advertising

The advertiser provides the publisher with the advertisement and target audience

and the publisher provides the advertisements to the customers. At this point, the flow of

information, for all intents and purposes is over. This makes it very difficult to analyze

the effectiveness of a particular ad campaign. In an effort to overcome this problem, it is

common practice to attempt to correlate periodic revenue/sales trends with adaptations to

the marketing strategy. However, due to the plethora of potential causal factors,

establishing the true level of dependence of the two movements is very difficult and often

all but impossible. In contrast, the flow of information in online advertising is bi-

directional as described pictorially below.










Advertiser C/UIPublisher C/UICustomer




Figure 3: Pictorial Representation of the Information Flow in Online Advertising

The advertiser provides the publisher with the advertisement and target audience,

the publisher provides the chosen customers with the advertisements, and the customers,

via their actions, provide the publisher and the advertiser with immediate performance

feedback. Common consumer activities which are of particular interest include clicking

on the ad, setting up an account with the advertiser, and/or making a purchase. Unlike

the interaction in traditional media advertising, this two-way flow of information makes it

extremely easy to measure the effectiveness of an online ad campaign. As a result,

performance based pricing schemes such as the CPC, CPS or the CPA have become

extremely popular as the industry searches for a more equitable risk sharing situation

[15].

Several academic researchers have acknowledged this recent industry trend to

incorporate performance measures into the pricing models. The authors of papers in the

second stream of research have adapted their problem descriptions to account for this

performance based pricing scheme. The next series of papers reviewed are all focused on

a pure CPC pricing model, and therefore their obj ective functions attempt to maximize

the number of clicks and ignore the amount of space used.

Langheinrich et al. [16] assumes that every customer has recently entered search

keyword(s) into a search engine and that the publisher has access to this list of keywords.

They propose a simple iterative method to estimate the probability of click through cO for









each ad/keyword pair based on historical click behavior. Given the resulting probability

matrix, they use a linear programming approach to solve the following LP.

Ma~x 1 i1 cr k~d

st


Sk,d,, = h, j = 1,...m
I=1


d, 2 0, i= 1,...,n, j= 1,...m
dll = probability that ad i will be
displayed for keyword j
h, = desired display frequency for ad i
m = total number of ads
n = number of keywords in the current corpus
k, = input probability for keyword i
cll = click-through rate of ad i for kw j


The obj ective of the problem is to maximize the likelihood that the delivered

advertisement will be clicked. The first constraint is a frequency constraint which

ensures that each ad is served the correct number of times. This is the same constraint

that is present in the Maxspace problem. The second constraint makes sure that the

display probabilities sum to unity for each keyword. The authors tested their solution

model through a series of simulations. Their artificially generated data set had 32 ads and

128 keywords. One potential limitation which is pointed out by the authors is that the

model is extremely sensitive to the accuracy of the click through probabilities. This

could cause a problem, given the inherent variability of these probability estimates. They

propose to avoid the unwanted ad domination by placing a floor for the display

probability of each of the ads. This ensures that each ad has some chance of being









selected. This problem is often referred to as the exploration/exploitation trade-off in

academic literature. The test results showed that the proposed method improved the

cumulative click through rates by approximately one percent over the random ad

selection procedure. This procedure may work well with smaller problems. However, as

the number of keywords grows to a realistic size, the search space will become very

large, and we would anticipate that the performance of the proposed LP based technique

would suffer. This model may be a good choice for a publisher who has selected a pure

CPC pricing scheme; however, it lacks several constraints which would limit its real-

world applicability. The model fails to limit overselling and fails to prohibit the same ad

from being displayed simultaneously in the same banner.

Tomlin [17] proposes an alternative nonlinear entropy-based approach to

overcoming the exploration/expl oitation problem which was mentioned in the work by

Langheinrich et al. [16]. Their model avoids unrealistic solutions which only show ads to

a very narrow subset of users; however, its applicability is still somewhat limited in that it

ignores prevalent space limitations.

Similar to the work by Langheinrich [16], Chickering [18] proposes a system which

maximizes the click through rate given only advertisement frequency quotas. Instead of

using keywords, they partition the ad slots into "predictive segments or clusters". Each

cluster/ad combination has an associated probability of click through. They use an LP

based approach to solve for a maximum revenue generating ad schedule based on these

probabilities and the limiting frequency constraints. They also acknowledge the

exploration/exploitation problem and attempt to overcome the issue by clustering the

click through probabilities. Their method was tested on the msn.com Web site and it









performed favorably, with respect to time and revenue, against the manual scheduling

method that was currently in use.

Nakamura and Abe [19] identify several weaknesses of the LP based approach

which they proposed in the 1999 work which they co-authored with Langheirrich et al.

[16]. The authors propose potential solutions techniques for each of these limitations,

including the expl orati on/expl oitati on i ssue, the data sparsene ss concern, and the multi -

impression issue. In an effort to overcome the exploration/exploitation issue, they

propose substituting the Gittens Index, an index developed by Gittens [20] which

maximizes the expected discounted rewards for the multi-armed bandit problem, in place

of the estimated click-though rates within the obj ective function. They also recommend

the use of an interior point LP method, and an alternative lower bounding technique for

determining the relative display probabilities. In an effort to deal with the large

cardinality of the search space they propose a clustering algorithm for the attributes,

thereby reducing the relative problem size. Lastly, they develop a queuing method in an

effort to eliminate the possibility of the same ad being shown multiple times in the same

banner slot. Similar to their previous work, the authors tested their proposed techniques

via a series of simulations on their artificially generated data set. Recall that this data set

is relatively small having only 32 ads and 128 keywords. The new technique performed

well in comparison with their original LP model and in comparison with a random

selection method. The average click-through rates were 5.3%, 4.8% and 3.5%

respectively.

Yager [21] proposes an intelligent agent system for the delivery of online Web

advertisements. The system utilizes a probability-based theme to select the









advertisements to deliver. The publishers are to share demographic data relative to their

Web customers with the advertisers. The advertisers, via a fuzzy logic based intelligent

agent, use this information to bid on advertising units with the publisher. The agents

iteratively adapt their bids based on the recurrent information relative to the site visitors.

The number of units won by a given publisher determines the probability that their ads

will be chosen. The publisher then uses a random number generator and the probability

matrix to select which advertisements to serve. Unfortunately, Yager' s method was not

tested. One potential challenge in applying Yager' s method is the difficulty in collecting

the necessary demographic data. Privacy laws make it very hard to collect good

demographic data similar to that which is recommended by the authors. Another method

to achieve a similar goal which has come under a little less scrutiny and which may be a

promising way to improve advertisement selection is to analyze a customer' s surfing

behavior. As part of this research, we propose a framework to analyze the raw html from

a customer' s recent click history using WordNet, a lexical database, and several

information retrieval techniques.

It is quite evident that the two streams of online advertisement research that we

have covered thus far are quite distinct, each having its own primary focus. The first

stream is focused on addressing the space limitations of banner advertisement scheduling,

taking into account the fact that banner ads are often of different sizes. Given that Web

space is at a premium, it is very common for ad prices to vary by size. Therefore,

allowing different size ads opens up the market to firms who may not be able to afford

the entire banner. While doing so increases revenue, it also creates an obvious

scheduling problem which the authors of the first stream have chosen to address. Under a










pure CPM model, which is the focus of this first stream of research, the advertiser

absorbs practically all of the risk. The publisher is paid the same rate regardless of the

performance of the ad campaign; therefore, from a revenue maximization point of view,

the publisher is just focused on delivering as many ads as possible. This is obviously not

an ideal situation for the advertisers.

The authors of the papers in the second research stream instead have chosen to

focus on the issue of attempting to create a schedule of ads which maximizes a

performance based measure and ignores the space constraint. Specifically, these papers

focus on the pure CPC pricing model where the publisher is paid a set fee each time a

user clicks on an advertiser' s ad. Under a pure performance based model such as this, the

publisher absorbs all of the risk. The advertiser stands to loose very little regardless of

the level of effort which they devote to the relationship.

Given that the overall advertisement campaign performance is directly dependent

upon the quality of the products provided by both the advertiser (ad, product, offer, etc.)

and the publisher (Web site content, incentives, targeting effort, etc.) either of these pure

pricing models may lack the correct monetary incentives to maximize the efficiency of

the agreement. In an effort to achieve a more equitable risk sharing situation, many

companies are adopting a hybrid model which often includes the CPM model and one or

more of the performance based pricing schemes. According to industry experts, this type

of model enhances the efficiency of the relationship by improving monetary motivation

for both parties. We hope to bridge these two streams of research; introducing

methodology which addresses both the Web advertisement space limitations and the

performance based pricing models.









Widespread adoption of the performance based pricing models seems to have

provided the expected additional motivation. Publishers and advertisers are expending an

enormous amount of effort to improve their probability of serving ads to interested

consumers, while avoiding inconveniencing those who are uninterested. This is in the

best interest of all of the stakeholders (publishers, advertisers and customers). Common

efforts include, but are not limited to geographical targeting, content targeting, time

targeting, bandwidth targeting, complement scheduling, competitive scheduling and

frequency capping (please see chapter 5 for a more detailed description of these

practices). The overall goal is to identify a subset of Internet customers who may be

interested in a particular advertiser' s product and to serve that advertiser' s ad

accordingly. Given that the average click rate is less than 2%, this is a monumental task;

however, as a result of the incredible potential benefits, the devotion of time and effort is

well justified. These efforts complicate the task of ad scheduling and therefore need to be

considered. In this research, we will extend the current literature by introducing several

of these complexities and their resulting formulations while at the same time proposing

and testing several artificial intelligence based heuristic and meta-heuristic solution

techniques for each model.

Current academic research in online advertisement scheduling has provided a solid

foundation upon which we can build. The models introduced thus far are still commonly

used in industry; therefore, this work is very important. However, since, the vast

maj ority of the industry is attempting, with limited success, to tackle a more complicated

mix of these factors, there is quite a bit of work left to be accomplished. We see this as a






26


great opportunity for the academic community and therefore will attempt to introduce and

provide potential solution techniques for several more complicated models.















CHAPTER 3
INFORMATION RETRIEVAL METHODOLOGIES

This chapter presents an overview of the field of information retrieval (IR). As this

field is a very broad and multidisciplinary, we focus primarily on the subsets which are

relevant to our research. In section 3.1, we provide a basic introduction and a general

overview of the field of IR research. In section 3.2, we briefly discuss several common

data pre-processing methods. In section 3.3, we introduce the vector space model. In

section 3.4 we cover structural representation and in section 3.5 we introduce lexical

databases with a focused coverage of WordNet.

3.1 Overview

Information retrieval (IR) is focused on solving the issues involved with

representing, storing, organizing, and providing access to information [22]. The

underlying goal of IR is to provide a user with information which is relevant to his or her

indicated topic of interest or query.

Obviously, this is a very broad and daunting task. Through the early 20th century,

this area of research was of interest to a very small group of people, primarily librarians

and information experts. Their goal was to improve the methods by which a library

patron was provided information/books which were relevant to his or her topic of interest.

However, as a result of many incredible technological advances, the last few decades

have seen the focus and reach of IR broaden substantially. No longer are we limited to

the information that is available in our local library. Thanks to advances in electronic










storage and telecommunication infrastructures, we can now access information from all

over the world.

The Web has become a massive distributed repository of knowledge and

information which seems to grow in popularity and size every day. Today it is just as

easy to Eind information by electronically querying a database in Hong Kong as it is to go

to a local library to search for the information. In many respects, it is even easier.

Similarly, corporate employees often find that their company's valuable information

resources, which are likely to be widely dispersed around the globe and used to be

impossible to locate, are now readily accessible via their corporate intranet. This ease of

access is wonderfully received by the users as the quantity of information which is

readily accessible to each person has grown exponentially; however, in turn so has the

challenge of determining which pieces of this information are actually of relevance.

Consequently, never has the task of information retrieval been more challenging or more

important.

IR has become a mainstream research area which is found in almost every

discipline and is of great interest to academicians, individuals and corporations. A much

more up to date definition of information retrieval which acknowledges many of these

drastic changes is provided by Wikipedia [23, p.1]. There information retrieval is defined

as "the art and science of searching for information in documents, searching for

documents themselves, searching for metadata which describes documents, or searching

within databases, whether relational stand alone databases or hypertext networked

databases such as the Internet or intranets, for text, sound, images or data." A basic

information retrieval system consists of three main parts; the user' s query, the ranking









function and the corpus of information. The j ob of the ranking function is to successfully

match each query with the best subset of documents from the available set. This is

accomplished by ranking the documents based on their respective relevance levels. This

is a very difficult task and therefore has received considerable attention from the

academic community. The most popular ranking model is the vector space model which

was introduced by Salton in his seminal work [24]. This model is discussed in detail in

the next section. Recently, another very promising research stream by Fan et al. [25-29]

highlights the potential for using genetic programming to actively discover a good

ranking function. This is a technique which is definitely deserving of further review.

Other very active areas of IR research include, but are not limited to; query expansion,

relevance feedback and literature based discovery.

Query expansion is a research area which is focused on improving our ability to

understand and respond to user queries. Query expansion attempts to improve a query by

expanding it to include additional terms which are expected to improve the system's

ability to respond [30]. This area is particularly relevant as a result of the dramatic

popularity of Internet search engines.

Relevance feedback is an area of research which attempts to analyze a user' s

relevance judgment with respect to the documents that are returned as a result of their

queries. Through the analysis, the hope is that the relevance judgment information can be

used to estimate a user' s profile which will help to improve the system's ability to

respond to future queries by attempting to infer a user' s real information need [31] [32].

Literature based discovery uses information retrieval techniques in an attempt to

uncover hidden connections between two indirectly related domains. The seminal work









in the area was conducted by Swanson [33]. In his work, Swanson discovered that fish

oil can be used as a successful treatment for Raynaud' s syndrome. This technique has

become very popular and has found a special niche in the area of biomedical sciences.

3.2 Data Pre-processing

Prior to employment of the vector space model or similar technique, the raw data is

often pre-processed in an effort to eliminate data which is deemed to be of little

informational significance. Two very common pre-processing methods are stopping and

stenaning. Stopping is a method by which all of the stop words are removed from each

document and query. Stop words are those words such as the, and, a, or, etc., which are

deemed to be of very little value with respect to the content representation of the

document. Using these words as index terms would only dilute the pool of keywords.

Removal of stop words also has a secondary benefit of compression, which reduces

considerably the size of the indexing structure. In doing so, computational complexity is

often drastically improved. After removing the stop words, the remaining words are

commonly stemmed. Stenaning is a process by which the terms are converted to their

base form by removing all of the affixes (suffixes and prefixes). A computer cannot tell

that cooking, cooked and pre-cooked are essentially the same word; however, after being

stemmed all three would be converted into their base word cook. This is very important

because it enhances the process of determining the relative importance of a particular

term such as cook. If the words were left in their original form, the relative importance of

cook, which is commonly based on the number of occurrences of the term, would be

underestimated. These pre-processing steps often provide significant improvements in

the efficiency of the process.









3.3 Vector Space Model

A primary goal of information retrieval is to select, from a corpus of documents,

the subset which is relevant to a user' s query. A very powerful IR method for achieving

this task is the Vector Space Model [24, 34].

The process begins by transforming each document and query into a series of

vectors by counting the frequency of occurrence of each keyword within each document

and query. These vectors are normalized (a number of normalization methods exist) and

are then used to determine the relevance of each document. The power of this process is

rooted in its ability to transform the textual aspects of the documents and queries into a

series of quantitative representations. The vector space model made maj or improvements

over its predecessor, the Boolean Model, by allowing for partial matching. Next, we

provide a more formal representation of the vector space model.

The vector space model is a theoretically well-grounded model which is easily

interpreted based on its geometric properties. Each document and query is represented by

a vector of key terms in n dimensional space. A query q, and a document dk would be

represented as:

q~ =(w1g,,w2~,,W n,

dk 1 W,k, 2,k,. "' n,k)


where n is the total number of terms in the collection and wl,, represents the weight which

is assigned to the term i for document j. The vector space model evaluates the relative

importance of document dk to query q, based on the degree of similarity between the two

corresponding n dimensional vectors, q, anddk [35]. In order for this model to be

meaningful, the vectors must be normalized. The dot product of the two vectors, which









gives the cosine of the angle B between the vectors (see figure #4), will be 1 if the two

are identical and 0 if they are orthogonal. The similarity measure between document dk

and query q, is as follows:



sim(q,,dk)= =1-



This similarity score, also called the retrieval status value (RSV) is calculated for each

document query combination and is used to rank the documents. A document' s RSV

score is used as a proxy measure of its relevance for a given query. The documents are

ranked based on their RSV score and served to the user in descending order.















Figure 4: Geometric Representation of the VSM

One of the most important steps in the vector space model is finding a good set of

index term weights, wl The index term weights are responsible for providing an

accurate estimation of the relative importance of the keywords within the collection.

Without a good set of index term weights, the VSM looses its effectiveness very quickly.

In their seminal work on this problem, Sparck Jones [36] introduced the TF-IDF function

which is still the most widely used and is considered by many to be the most powerful

index weighting function. Although many content based features are available within the









vector space model that may be used to compute the index term weights, the two that are

most common, and the ones that are used in the TF-IDF function are the term frequency

(tf) and the inverse document frequency (idf). The basic TF-IDF function is as follows:


wl = (tf~l *log d The term frequency, tf,, is calculated by counting the frequency of


occurrence of term i in document j The larger the tJ; the more important the term is

considered to be in describing the document or query. The inverse document frequency


is calculated as (idf), = log where N represents the total number of documents in the
df

collection and df represents the total number of documents within which term j appears.

The basic intuition behind the idfis that a keyword which appears in very few documents

is likely to be of greater value in classifying those documents than would be a keyword

which appears in all of the documents. The idf scores are assigned accordingly. The

keyword which appears in every document is assigned an idf score of 0 while a keyword

appearing in very few documents would receive a much higher idf score. By combining

the two, the TF-IDF function gives the greatest weight to terms which occur with high

frequency within a very small number of documents.

The vector space model has stood the test of time. Although many alternative

ranking methods have been proposed since its inception in the late 60s, the general

consensus is that the VSM is as good or better than all of its competitors [22]. Although

no method has been able to take its place, several attempts to improve the basic vector

space model are gaining in popularity. Two of these efforts which are of particular

interest involve the inclusion of structural and lexical information within the model.










3.4 Structural Representation

The traditional VSM considers each document as a simple bag of words leveraging

only the resulting textual representation. This method has proven to be very useful and

effective, but many researchers including Halasaz [37] hypothesized that there might be

additional information which is overlooked by the basic VSM. This additional

information is found in the basic structure of the document. The fundamental idea is that

the location in a document where a term appears may provide additional information as to

how valuable that term may be in developing a characteristic representational vector for

the document. Consider the basic structure of an HTML document as an example. An

HTML document commonly consists of a series of independent sections such as the

header, keywords, title, body, anchor, and abstract. From a structural representation

point of view, a term which appears in the header might be more important that one

which appears in the anchor. Alternatively, a term which appears in the body and in the

anchor may be more important than one which just appears in the title. A number of

researchers, including Navarro and Yates [38, 39] and Burkowski [40], have developed

alternative models which incorporate document structure into the term relevance

calculation. Although many of these methods are criticized as being somewhat narrowly

focused and lacking in generalizability, the general consensus acknowledges that this

structural representation definitely contains important information and should therefore

be considered. Accordingly, we incorporate this type of information in our research

model .









3.5 WordNet

In the previous section, we describe the potential value of including structural

information within the VSM model. Lexical information may also be very useful. The

primary goal of a lexical reference system is to provide its users with word relationships,

whereby when a user inputs a word, the system provides a response which summarizes

how that word is related to other words. One system which incorporates lexical analysis

is WordNet. "WordNet is an online lexical reference system whose design is inspired by

current psycholinguistic theories of human lexical memory. English nouns, verbs,

adj ectives and adverbs are organized into synonym sets, each representing one underlying

lexical concept. Different relations link the synonym sets [41, p.1]." The most basic

semantic relation upon which WordNet is built is the synonym [42]. Synsets, or sets of

synonyms, form the basic building blocks for the system. For example, hat is included in

the same synset (also called concepts) with lid and chapeau. Although the synonymy

between terms forms the basic structure of WordNet, the system is also organized based

on the relationships between differing synsets. These relationships are used to form

lexical hierarchies of concepts. For example, consider the terms robin, bird and animal.

The important relation that characterizes the relationship between these terms is

hyponymy. Hyponymy is the relation of subordination which represents the is-a

relationship. Robin is a hyponym (subordinate) of bird, and bird is a hyponym of animal.

These types of relationships are hierarchical in nature and therefore are easily represented

in the form of an inverted tree. The root node is represented by the most general term,

with the terms of each layer of nodes down the tree having a more narrow focus or scope.

In our simple example, animal would be the root node, bird would fall in the first layer

and robin would be in the second layer of nodes. Hyponym is only one of many lexical









relations which are present in WordNet. Other common relationships which are present

include the part-of, has-part, member-of and entails relationships. The current version of

WordNet, version 2. 1, includes 117,097 nouns, 11,488 verbs, 22,141 adjectives, 4,601

adverbs and 117,597 synsets [41]. Lexical systems such as WordNet are gaining in

popularity and are finding their way into many different fields of research.

Lexical reference systems have been used within information retrieval for many

different purposes including word sense disambiguation [43-45] and semantic tagging

[46] [47] ; however, the use that is most pertinent to our research involves text selection.

Many researchers including Vorhees and Hou [48] and Gonzalo et al. [49] have shown

that the text selection process can be vastly improved by utilizing a lexical reference

system such as WordNet to enhance the process of developing a vector representation for

documents and queries. The basic idea is very similar to that of stemming. In stemming,

we convert terms into their root words in an effort to avoid underestimating the

importance of a particular term. In contrast, the goal of this research stream is to

consolidate terms based on their synonymic relationships for the same purpose. With

stemming, we make the argument that cooks and cooked should not be treated as separate

words and that doing so would underestimate the relative importance of the term cook.

Extending this example, how should the terms cook and prepare be treated? Researchers

have shown that, in many cases, combining synonyms such as these enhances the

performance of the traditional vector space model. This task can be accomplished by

expanding the keyword indexing space to include synsets instead of being limited to just

the terms. Lexical systems such as WordNet have proven to be very powerful tools

which can often improve our information retrieval research models. This is only a brief







37


introduction to WordNet. Fellbaum [42] provides a much more thorough review of the

sy stem.















CHAPTER 4
LARGE SCALE SEARCH IVETHODOLOGIES

This chapter introduces the field of global optimization/global search. Section 4.1

provides a general overview. Sections 4.2 and 4.3 provide a detailed analysis of two

powerful mteaheuristic techniques; genetic algorithms and neural networks, with specific

emphasis on the variations that will be employed in our research. The final section,

section 4.4, introduces and discusses the implications of the No Free Lunch Theorem.

4.1 Overview

Optimization is an extremely active field of research which falls in the interface of

applied mathematics, computer science and operations research. This area has received

considerable attention during the last several decades with researchers devoting a

considerable amount of effort to the development of improved solution techniques. This

is especially true for combinatorial optimization which is a subset of the set of global

optimization problems. Combinatorial problems, which are commonly found in many

functional areas including operations management, information technology,

telecommunication, etc., pose an enormous challenge due to their curse of

dimensionality. As a result of their combinatorial nature, as these problems grow in size,

their search spaces often become extremely large, discontinuous and complex, and

therefore these problems are extremely difficult if not impossible to solve to optimality in

polynomial time. The series of online advertisement scheduling problems that we

address in this work are all combinatorial in nature and have been proven to be NP-hard

[50, 51]. As is discussed below, common deterministic solution approaches are often









ineffective in handling these types of problems; therefore, researchers have developed

many other alternative heuristic techniques. Along these lines, one effort involves the

use of metaheuristics. A metaheuristic is an algorithmic framework, approach or method

that can be specialized to solve optimization problems [52]. Metaheuristics, which

represent an effort by the research community to leverage the enormous amount of

computing power that has become available during the recent past, are very popular

among academics and practitioners and have been used very successfully to attack some

of our most difficult optimization problems. We will first provide a brief overview of the

wide range of solution techniques which are applied to global optimization problems and

then we will turn our focus to the two specific metaheuristic techniques which we utilize

in our effort to provide appealing solution alternatives to the proposed online

advertisement problems.

As mentioned above, solution techniques for global optimization problems have

received considerable attention during the past couple of decades and they fall into two

main categories; deterministic and stochastic methods (Higure #3). Deterministic

methods, which include both calculus based and enumerative based techniques, attempt

to find the local optima by developing a sequence of iterative approximations which

converge to a point which satisfies the necessary conditions. According to Filho et al.

[53], calculus based methods can be further subdivided into two classes; indirect and

direct methods. The indirect methods search for the local optima by solving the set of

equations which result from setting the gradient of the objective function equal to zero,

restricting the search space to those points with slopes of zero in all directions.
















Calculus Based Enumerative
Methods Methods


D irect I Indirect Simulated Evolutionary Neural Nets
Annealing Computing






Evolution Evolutionary Genetic Genetic
Strategies Programming Algorithms Programming


Figure 5: Classes of Search Methods (Basic Model Borrowed from [54])

The direct methods seek the local optimum by instead hopping along the function via a

simple comparison technique formally known as hill climbing. Hill climbing begins with

a random starting point and at each step selects at least two additional points which are

located at predetermined distances from the starting point. Of the two, the point with the

most favorable local gradient is chosen as the starting point for the next step. Two

obvious limitations of these methods are that they are local in scope and that they only

work for smooth well-defined functions. As a result, according to Goldberg [55], they

lack the robustness to be very effective against real world problems which are often

combinatorial in nature. Alternatively, enumerative methods search every point in the

obj ective function' s domain space one at a time. Obviously, these methods, unlike the

calculus based techniques, overcome the limited scope issue, guaranteeing that the global

optima will be identified. However, because of the enormous number of feasible









solutions for any large problem, this type of method requires significant solution time and

computing power for real-world applicability. As a result of the combinatorial nature of

the problems, the time to solve them grows exponentially with the size of the problem.

Most combinatorial optimization problems are NP-hard [50], and therefore are not likely

to have an algorithm which can find the optimal point in polynomial time. Even dynamic

programming, which is considered by many to be the most powerful enumerative

method, breaks down for all but the smallest of problems [55]. Simply put, these

techniques take too much time, lacking the efficiency to handle real-world problems [55].

In contrast, the guided random search methods, which include simulated annealing,

evolutionary computing and neural networks, attempt to intelligently cover a larger

portion of the search space identifying as many local optima as possible with hopes that

one of them will satisfy the global optimization conditions. These are guided random

search techniques which are all based on enumeration. What separates them from the

enumerative techniques discussed above is that they use supplemental information in an

effort to intelligently guide the search process. Simulated Annealing (SA), which was

independently invented by; Kirkpatrick, Gelatt and Vecchi in 1983 [56], and by Cerny in

1982 [57], is based on the laws of the thermodynamic annealing process. This technique

attempts to deal with highly nonlinear combinatorial optimization problems by

mimicking the process by which a metal cools into minimum energy crystalline structure.

The search space is traversed probabilistically via a series of states which are based on a

cooling schedule. Proponents claim that this technique is very proficient at avoiding the

entrapment of local optima. The common theme of nature knows best holds with neural

and evolutionary computation (EC) as well, each being a solution technique which is









based on a naturally recurring process. Evolutionary computation has been used for over

three decades and is an attempt to apply Darwin's basic theory of evolution to solve

artificial scientific applications. It is a fact that in nature, organisms which are best suited

and equipped to compete for the limited pool of resources have a higher probability of

survival and are therefore more likely to prosper through the natural mating process. In

doing so, they propagate the process of survival of the fittest by passing on their genetic

information via the hereditary process. The work which makes up EC was begun

independently by two different researchers. Rechenberg [58] introduced evolution

strategies (ES) in an effort to achieve function optimization. Fogel [59] introduced

evolutionary programming based on his work with finite state machines. Genetic

Algorithms, which differ from evolutionary strategies and evolutionary programming in

their emphasis on the use of specific operators, especially crossover which mimics the

biological process of genetic transfer, was invented by John Holland and his colleagues at

the University of Michigan in 1975 [60]. Genetic Programming is an extension of

genetic algorithms within which members of the population are parse trees of computer

programs. We will discuss genetic algorithms in much more detail in section 4.2, as it is

one of our chosen solution techniques. Neural computation, which represents yet another

attempt by the research community to mimic a naturally occurring process, is believed by

many to have been proposed in the 1800s in an effort to determine how the human mind

worked; however, formal theoretical analysis is believed to have been started by

McCullough and Pitts [20] in the 1940s. Current artificial neural network models offer

massively parallel, distributed systems inspired by our never ending attempt to model the

anatomy and physiology of the cerebral cortex. These systems exhibit a number of useful










properties including learning and adaptation, approximation and pattern recognition, and

have been successfully applied to many challenging application domains. We will also

cover neural networks in more detail in section 4.3, as they complement genetic

algorithms as one of our chosen solution approaches.

The online advertisement scheduling problems which we introduce are NP-hard,

commonly consisting of a complex search space which is discontinuous and multimodal.

As a result, the deterministic methods discussed above lack the robustness to effectively

handle all but the most trivial problem instances. In an effort to provide efficient solution

alternatives we propose methods based on the theory of genetic algorithms and neural

networks .

4.2 Genetic Algorithms

Genetic Algorithms, which were developed by John Holland [60], are intelligent

probabilistic search algorithms which have been applied to many different combinatorial

optimization problems [61]. As mentioned above, genetic algorithms are based on the

process of natural selection. During the course of evolution, natural populations evolve

based on the principle of survival of the fittest. Organisms which are more prepared and

better equipped to compete for the limited pool of resources tend to have a better chance

to survive and reproduce while those which are inferior tend to die off. As a result, the

genes from those highly fit organisms are likely to propagate in increasing numbers from

generation to generation. Consequently, the combination of favorable characteristics

from highly fit ancestors is likely to result in the production of individuals which are even

more fit than those which preceded them. This evolutionary process often allows species

to adapt in a way which makes them more and more capable of dealing with their

environments.









Genetic Algorithms attempt to simulate this process. A GA starts with an initial

population of individuals (chromosomes), each representing one potential solution to the

given problem. Similar to the evolutionary process, new generations of individuals are

iteratively created from this initial population via the application of three primary genetic

operators; reproduction, crossover and mutation. Each individual, which is encoded into

a string or chromosome, has a fitness value which is evaluated with respect to the

problem's objective function. The probability that an individual will have an opportunity

to survive and/or reproduce is based on their relative fitness value. Those individuals

(normally the highly fit candidates) which are chosen for reproduction generate offspring

via the crossover of parts of their genetic material. The children are made up of

characteristics which have been inherited from both parents. The offspring commonly

replace the entire prior population (generational approach); however, in some cases, a

portion of the prior population may survive (steady-state approach). In addition, a small

percentage of the genetic material of the individuals which make up the new population

undergoes mutation. This operator affords the GA the ability to reclaim important

genetic material which may have been unfortunately lost in an earlier generation. In

doing so, mutation effectively allows the GA to move to a different area in the search

space [62]. Pseudocode for a basic GA is provided below.

begin
Generate the initial population;
Assign fitness values to the individuals in the initial population;
Do until a pre-defined stopping criterion is met:
Select the strings that will survive as is;
Select strings for the mating pool;
Mate the chosen parents via crossover;
Apply mutation to the new strings earmarked for the next generation;
Evaluate the fitness of the new population;










Loop
end

Although they are based on a semi-random process, genetic algorithms, because of

their ability to exploit historical information, garner much higher expectations than that

which would be given to a purely random process. As a result of their relative popularity,

many alternative GA permutations have been developed. These include different

representation, selection, crossover and mutation schemes. We have provided a very

basic description of the process; however, a more thorough account can be found in work

by Aytug et al. [63], Dumitrescu [64], Goldberg [55] and Mitchell [65]. Although GAs

do not guarantee optimality, because of their perceived relative proficiency in covering a

large portion of the search space and their relative ease of implementation, genetic

algorithms have been a very popular solution technique for combinatorial optimization

problems in many different disciplines [61]. One such area which deserves specific

mention as it is closely related to our work is scheduling. Genetic algorithms have been

applied to a plethora of different scheduling problems by many authors including Aytug

et al. [66, 67], Chen et al. [68], Biegel and Davern [69], Davis [70], Fang et al. [71],

Fogel and Fogel [72], Li et al. [73], Liaw [74], and Wang [75]. This is by no means an

exhaustive list. For a more thorough account, please refer to works by Aytug et al. [63]

and Back et al. [76].

One issue which has remained a significant challenge for GA researchers has been

the inclusion of constraints within the GA framework. Without constraints, the

comparison of individuals (chromosomes) within a GA, as detailed in the basic

description above, is fairly straight forward. The relative value of each individual is

determined based on its performance with respect to the obj ective (fitness) function in









comparison with that of the other chromosomes. This is simple and intuitive. Dealing

with the infeasibility of a potential solution in the constrained case is much more

difficult. With the incorporation of constraints, we must consider not only the

performance with respect to the obj ective function, but also their performance against

each of the constraints. Given that the constraints often represent limited resources, this

is not an easy task. This problem has received considerable attention and as a result

many alternative approaches have been developed. These include the use of penalty

functions, maintaining feasibility, and separating objectives and constraints. Coello [77],

Michalewicz [78, 79] and Sarker [80] provide very good surveys and critiques of the

many contrasting ideas (in addition, Coello maintains a dedicated Web site [77] within

which a list of related work is published). Unfortunately, although many different

techniques have been proposed, none of them has gained a consensus as being the best.

Instead, as Coello [77] points out, most of the techniques are very good at handling

certain types of problems, but their generalizability is very limited.

Given that our online advertisement scheduling problems fall into the constrained

optimization category, this lack of consensus presents a challenge. We employ a method

which maintains feasibility. This technique, which is thought by many including Coello

[77] and Liepins et al. [81, 82] to be a very good technique relative to the other

alternatives, involves the use of repair operators which maintain feasibility. Instead of

searching the entire landscape, this method limits itself to the feasible region. This

technique has been employed successfully by Raidl [83], Michalewicz and Nazhiyath

[84], Tate and Smith [85], Xiao et al. [86, 87], and many others. One factor which limits

the generalizability of this technique is the need for a problem-specific repair operator.









However this is not extremely limiting because, for most problems, identification of

sufficient repair operators is not a difficult process. For all of the problems that we

propose, greedy-based heuristics are available or will be developed. We will use these

heuristics as our repair operators.

Another issue that must be considered when employing this type of constrained

GA solution technique is how to utilize the repaired chromosomes. The basic question is

whether or not the repaired individuals should be allowed to reenter the general

population. Proposed methods to handle this situation cover the entire continuum from

those such as Liepins [81, 82] who recommends that none of the repaired solutions be

allowed to enter the general population, to those such as Nakano [88] who argue that

every chromosome should be returned. Not surprisingly, many others such as

Michalewicz and Dasgupta [89] and Orvush et al. [90] argue that the best alternative falls

somewhere in the middle. We follow the recommendation of Nakano [88], not

discarding any of the chromosomes.

4.3 Neural Networks

Mankind has long been fascinated and endlessly motivated in our attempts to

understand the amazing power of the human brain. Although our understanding of this

process is considered by most to be extremely incomplete, years of research have led to a

basic understanding of how the brain trains itself to process information. Neural

networks found in the brain consist of billions of specialized cells called neurons. These

neurons are organized into a very complicated intercommunication network via axons

and dendrites (see figure #4). Each neuron is connected to, and therefore collects

information from, tens of thousands of other neurons. Information is shared in the form

of varying degrees of electrical impulses. A neuron sends out an electrical signal through










a specialized extension called an axon. At the end of each axon, a specialized structure,

called a synapse, converts the signal into a series of electrical effects which may inhibit

or excite activity from the connected neurons. As a result of this activity, the connected

neurons will make adjustments to their electrical activity. Learning occurs via a series of

adaptations of the sensitivity/connection strengths of the synapses which in turn changes

the level of influence that one neuron may have on another. It is the many different

patterns of firing activity which are created by this simultaneous cooperation among the

extensive network of neurons which provides the astounding processing power of the

brain.




Synapse








Figure 6: Pictorial Representation of the Cerebral Cortex [91]

Artificial Neural Networks (ANN) attempt to mimic and, in doing so, leverage

some of the astounding power of this process. Although, as a result of our limited

understanding of the process and the limited amount of computing power available, our

efforts obviously represent a gross simplification of the natural process, the resulting

models have proven to be very powerful. The common sentiment is that McCulloch and

Pitts [92] in the 1940's were the first researchers to attempt to quantitatively model this

process. Since then, many researchers have attempted to improve upon their work. As is

the case with genetic algorithms, this intense research focus has resulted in many









different variations of ANNs. We will not, in this work, attempt to introduce all of them.

Instead, we will provide a general description of the most basic model and discuss in

detail the model variation that we have chosen to employ.

ANNs consist of a series of processing units often called nodes which represent the

neurons of the artificial system. Each node has a series of inputs, an activation function

and an output. The nodes are configured in the form of an interconnected network with

each connection having an associated weight which determines the relative strength of

the connection (see figure #5). These weights determine how influential one node

(neuron) will be on another. Through an iterative series of steps, inputs are commonly

transferred from one layer of nodes to another. Having received its inputs, a node

preprocesses the information and applies an activation function to the preprocessed data

in creating the final output of the node. Many different preprocessing functions have

been proposed and tested, such as summation, cumulative summation, and product of the

weighted inputs. Likewise, many different activation functions have been proposed and

tested. These include, but are not limited to, linear functions, step functions, hyperbolic

tangent functions, sigmoid functions, and tangent-sigmoid functions. In addition, many

different variations of the artificial neural network have been developed with respect to

the network topology, directions of information propagation and weight modification

schemes. Given the number of alternatives, cataloging the variations of ANNs would be

a formidable task. For a more thorough review, please see the work by Kartalopoulos

[93].




















Hia~dden layer Outputfs


Figure 7: Pictorial Representation of a Basic Feed Forward ANN [91]

Since their inception, ANNs have been widely applied in many domains such as

classification, pattern recognition, time series prediction, and optimization. The idea of

using neural networks as a solution approach for NP-hard optimization problems [50, 51]

originated in 1985 when Hopfield and Tank [94] applied a Hopfield neural network as a

solution technique for the Traveling Salesman Problem (TSP). They used an Energy

Function to capture the constraints of the problem. This Energy Function was then

minimized using a neural network. Since this seminal work, this area has received

increasing attention with many researchers developing new techniques and attacking a

number of different NP-hard optimization problems with neural network methodologies.

Several researchers have applied neural networks to different variations of scheduling

problems. These include Poliac et al. [95], Sabuncuoglu and Furgun [96], Foo and

Takefuji [97], Lo and Bavarian [98], Satake et al. [99], and Lo and Hsu [100]. For a

more thorough review, please see the works by Burke and Ignizio [101], Looi [102],

Sabuncuoglu [103], Huang and Zhang [104], and Smith [105].

We have chosen to employ an Augmented Neural Network (AugNN) which is a

very promising variation of the neural network architecture proposed by Agarwal et al.










[106, 107]. Common complaints concerning many other neural network based

approaches for combinatorial optimization problems are that they tend to get stuck in

local optima and that they are often very inefficient with respect to computational

complexity. As a result, their performance often deteriorates exponentially with problem

size. These concerns have been especially common for Hopfield-based solution

techniques. Early applications of the AugNN procedure approach are very promising

with respect to these concerns. In the AugNN approach, the traditional neural network

model is augmented to allow for the embedding of domain and problem-specific

knowledge. AugNN is a metaheuristic approach which takes advantage of both the

heuristic approach and the iterative local-search approach. AugNN utilizes a proven base

heuristic to exploit the problem specific structure and then iteratively searches the local

neighborhood randomly in an effort to improve upon the initial solution. In this

approach, the optimization problem is formulated as a neural network of input, hidden

and output nodes. As in a traditional neural network, weights are associated with the

links between the nodes. Input, output and activation functions are designed which both

model the constraints and apply a particular base heuristic. The chosen base heuristic

provides the algorithm with a starting solution/neighborhood in the feasible search space.

After at most n iterations, or an epoch, a feasible solution is generated. A learning

strategy is used to modify the relative weights after each epoch in an attempt to produce

improved neighborhood solutions in future epochs. If improvements are found from

epoch to epoch, the weights may be reinforced. In addition, if an improved solution is

not identified within a predetermined number of epochs, the algorithm has the ability to

backtrack. Because of its learning characteristics and its ability to leverage the relative









problem structure, AugNN tends to find good solutions faster than traditional meta-

heuristic approaches such as Tabu Search, Simulated Annealing and Genetic Algorithms.

In addition, because it does not involve any relaxations or LP solutions, it is a relatively

simple technique in terms of computational complexity and ease of implementation. The

procedure was initially applied to the task-scheduling problem by Agarwal et al. (2003).

More recently, AugNN approaches have been successfully applied by Agarwal et al. to

the flow shop scheduling problem [108], the task scheduling with non-identical machines

problem [109], the open shop scheduling problem [110], and to the bin packing problem

[111].

One consideration that must be addressed when developing an AugNN based

model is which heuristics to utilize. Original efforts were focused solely on the use of

greedy-based heuristics; however, Agarwal et al. [107] have recently demonstrated that

this may not always be the best strategy. They prove that using a combination of greedy

and non-greedy heuristics within the augmented neural network formulation provides

better solutions than using either alone. We follow this strategy when developing our

AugNN models. See the experimental design for more details.

4.4 The No Free Lunch Theorem

Although genetic algorithms and neural networks are extremely popular and have

both been applied to a wide variety of optimization problems with very favorable

reported results, recently many have begun to question their relative performance

comparisons. The No Free Lunch Theorem (NFL) presented by Wolpert and Macready

[112, 113] proves that generalized deterministic and stochastic algorithms, which are

often called black box algorithms, such genetic algorithms, neural networks and

simulated annealing techniques have the same average performance when executed over









the entire set of problems instances. This theorem implies that these techniques are no

better with respect to average performance than a random search method over the entire

set of problems. Opponents of the theorem initially criticized it claiming that it is very

uncommon for one to claim that their technique performs better than some other

algorithm for ALL problem instances, but instead often only claim superiority for a

subset of the problem space. In response to this criticism, Schumaker et al. [114], have

proven that the theorem also holds for a subset of the problem instances. This line of

research has initiated and motivated academic discussion on some very difficult questions

that needed to be asked; however, it is fortunately not as limiting as it may seem at first

glance. Whitley and Watson [115], Koehler [116], Kimbraugh et al. [117], and Igel and

Touissant [118] indicate that there are a number of situations for which the NFL theorem

does not apply. Whitley and Watson [1 15] point out several limitations of the NFL

theorem including that it has not been proven to hold for the set of problems in the NP

class of complexity. They also indicate that doing so would prove that NP / P. Koehler

[1 16] extends this body of research by showing that the NFL theorem holds for only

trivial subclasses of the binary knapsack, traveling salesman and partitioning problems.

Although these NFL theorems are not quite as widely applicable as once thought, every

effort should be made to understand and consider their implications.















CHAPTER 5
RESEARCH MODELS)

Two of the most pressing problems facing companies in the online advertising

industry are estimation of user behavior and advertisement scheduling. We will propose

and test potential solution techniques for each of these problems.

In the previous chapters, we have provided a basic introduction to online

advertising, information retrieval, and large scale search, each of which play an integral

part in our research model. In this chapter, we will outline the specific online advertising

problems that are addressed in this research, and we will discuss, in detail, our research

plan for each of those problems. In section 5.1, we provide a brief summary of the

problems at hand. In section 5.2, we discuss in detail the proposed method for estimating

user behavior with respect to online advertisements. In section 5.3, we discuss in detail

three new variants of the NP-hard online scheduling problem and our proposed solution

techniques.

5.1 Problem Summary

As presented in section 2.2, the initial pricing model for online advertising was the

CPM (cost per mille) model which was adopted from the traditional print and television

media industries. The payment structure of the CPM model is based solely on the

number of ad impressions served. The publisher is paid a set fee for each ad impression

which is served, regardless of its effectiveness. Under this model, financial

considerations motivate the publisher to focus primarily on only one thing; serving as

many ad impressions as possible. Within the print and television mediums, unlike the









online medium, it is very difficult to determine the effectiveness of a particular

advertisement. Trends in sales and revenue generation can only be indirectly tied to a

particular advertisement campaign. Unless a consumer specifically indicates that their

purchase is the result of a particular advertisement to which they were exposed, it is all

but impossible to make that connection. However, this is not the case with online

advertising. As a result of the two-way flow of information, it is often much easier to

measure the effectiveness of an online advertisement. Immediate post-ad exposure

behavior by the user is easily tracked by the advertiser and the publisher. They can

immediately tell if the user clicks on the advertisement, sets up an account with the

advertiser, makes a purchase from the advertiser, etc. This behavioral visibility has led

many advertisers to question the efficacy of the CPM pricing model for the online

industry. They believe that, based on the more open, bidirectional, flow of information, it

may be in everyone's best interest to instead have pricing tied to one or more of these

user behaviors. As a result, several performance based pricing models such as CPC (cost

per click), CPS (cost per sale) and CPA (cost per acquisition) were developed and have

become extremely popular. There are many firms still using the pure CPM model;

however, a large portion of the industry has moved to models which are either based

solely on performance measures or are hybrids which incorporate both the CPM and

performance based criteria. These models are generally considered to provide a more

equitable risk sharing relationship. With this industry movement towards performance

based pricing models, the task facing many publishers has become much more difficult.

No longer can they simply focus on randomly serving as many ads as possible, but are

now faced with two maj or challenges. In an effort to maximize the utilization of their









most precious resource, advertising space, they must still attempt to serve as many ads as

possible, but in addition they must now attempt to do it intelligently. This leaves

publishers with two distinct problems. First, they must attempt to estimate user behavior

with respect to specific ads, and second, they must attempt to schedule the delivery of

advertisements accordingly. We propose potential solution techniques for each of these

problems.

5.2 Information Retrieval Based Ad Targeting

Given the current popularity of performance based pricing within the online

advertising industry, many publishers find that a large portion of their revenue stream is

dictated by the actions of the users. Each time a user shows interest in a particular

advertisement by clicking the ad, making a purchase, etc., the publisher is paid a fee. In

an effort to maximize revenue, publishers are eager to increase the probability of these

actions taking place. One assumption that underlies this portion of our work is that the

likelihood of a user taking action with respect to a particular advertisement increases with

a higher level of interest for the given good or service which is being advertised. This

assumption is very logical and is widely accepted within the industry and the academic

research community. Based on this assumption, the obvious solution would be for

publishers to serve users advertisements for products and services for which they have

sincere interest; however, this is a difficult task.

Unfortunately for publishers, estimating a user' s affinity for certain products and

services is a very challenging and controversial task. Although users and publishers have

common interests in that users would also prefer to be exposed to ads for products and

services for which they have an interest than to those that they do not, the real problem

comes in getting from point A to point B. How does a publisher gain an understanding of









a user' s interests? This is a very sensitive subj ect that must be addressed with extreme

caution. Users are very protective of their privacy and the efficiency of their Web surfing

experience. Any efforts on the part of a publisher which violates either is very likely to

have a depressing effect on long term corporate revenue.

As is indicated in section 2.2, several methods of developing this estimation of a

user' s behavior have been proposed and tested. These methods include, but are not

limited to, analyzing a users search queries [16] [17], analyzing user' s prior click

behavior [18] and analysis of user-specifie demographic data [21]. We recommend and

test an alternative approach which is analytically appealing and not overly intrusive. We

recommend a method which is based on the detailed analysis of the raw html which

composes a user' s recent Web surfing history. The basic intuition is that by analyzing a

user' s recent browsing history, we can improve our understanding of their current

interests. To the best of our knowledge, no one else has specifically recommended or

tested this type of an approach, and therefore we feel that our unique application of

information retrieval and lexical techniques as a method of analysis will contribute

positively to the current body of literature. Our goal is to provide those in industry with a

viable alternative with which to address this difficult challenge.

The basic steps of our process are as follows:

1. Track a user' s surfing behavior for a predetermined period of time
2. Collect the corresponding html pages
3. Develop a characteristic array for each user by parsing their respective html pages
using IR and lexical-based techniques
4. Develop a characteristic array for each advertisement
5. Using a chosen similarity measure, evaluate each ad/user combination
6. Serve the ads accordingly and measure the effectiveness of the model










Steps 1 & 2: Tracking a user's surfing behavior and collecting HTML pages.

We tracked the surfing behavior of 68 students for a period of at least 2 hours and

captured their respective html files accordingly. 14 of the students failed to follow the

instructions in one way or another, leaving us with 54 users for the proj ect. Students

were chosen as the users for this proj ect based solely on their availability and willingness

to participate.

Steps 3&4: Develop a characteristic array for each user and advertisement.

Developing a characteristic array for each user is a three step process. First, we parse the

set of html pages for a given user into a term vector. Second, we determine the relative

importance of each term with respect to developing a characterization of the user' s

interests through their chosen html pages. To accomplish this task we employ several

variations of the basic model set forth by Cecchini [1 19], which incorporates the use of

WordNet concepts. We modify his basic model to also allow for structural analysis as

follows:






1) wc~i = df~,





wln the importance of term i on domain u

sZ weight factor assigned to structural element :

tJ;~Z~d the frequency of term i in structural element : in document d

did the document length (total number of terms) of document d
N the number of html documents which are present in a user' s domain u










dJ; the total number of documents within which term i appears

Function 1, which calculates an estimate of the relative weight of each term i in user

domain u, is composed of two distinct parts. The first is a weighted term frequency

calculation which is normalized by the document length and which incorporates an sZ

term into the weight function. The sZ term, O <; sZ <1, represents the relative weight

which is assigned to structural element : of the html documents. This term allows us to

employ structural analysis which has been recommended by several researchers including

Navarro and Yates [38, 39] and Burkowski [40]. Recall that the structural elements of

the html documents that we will consider in our analysis include the keywords, body and

title. Each of these sections is easily identified by its start and stop tags. The basic

intuition behind structural analysis is that the terms found in one part of the document

may hold more information than those which are found in other sections. For a particular

weighting scheme the assignment of a higher weight to a particular section follows from

an underlying assumption that the associated section will produce concepts which are of

higher informational value than the alternative sections which have been assigned a lower

weight. For example in scheme #2 from table #1, the keywords section is assigned the

largest weight of .7; therefore, it receives considerably more prominence than the title

and body sections. We test several different weighting schemes within our analysis, in an

attempt to identify the best weighting combination. The tested weighted schemes are

detailed in table #1 below. Although it was not extremely prevalent, we did find that a

small percentage of the html documents did not have a keywords section. As a result,

you will notice in the referenced table, we have provided a contingent weighting

distribution for each of the schemes which overrides the original scheme in this situation.










Table 1: Structural Element Weighting. Schemes
Title Body Keywords
Scheme 1 0.3 0.2 0.5
Scheme 1 if no KW Sect. 0.7 0.3 0
Scheme 2 0.2 0.1 0.7
Scheme 2 if no KW Sect. 0.7 0.3 0
Scheme 3 0.25 0.5 0.25
Scheme 3 if no KW Sect. 0.3 0.7 0


Unlike the traditional IR task of separating documents based on their individual

representations, we are instead attempting to develop one representation for the entire set

of documents for a given user u. Given this obj ective, a term which appears in many of

the documents is anticipated to have greater informative power than one which only

appears in a small number of documents. This is the motivation behind the second term

o~ffuncton 1, .df In function, weir generalize our term reprresentain scheme byr



introducing the notion of concepts, c Each concept, c represents a synset and is

composed of the {iiz,,. i27 :' n}trms that make upn that synnset nRecll fro~m our discssin~E~


of WordNet that a synset is composed of the chosen term and all of its synonyms.

Considering concepts allows us to avoid over or under estimating the importance of a

particular term by aligning it with its synonyms. Function 2 provides a concept weight

by summing up the weights for all terms i in the synset c In some cases, the same term

may appear in more than one concept. We adopt a method introduced by Sacaleanu and

Buitelaar [120] in function 3 to facilitate the assignment of the term to one of the concepts


in this situation. Function 3 includes an additional term where T is the total number



of terms within a concept c and Ic is the cardinality of the concept. The term is










assigned to the concept with the highest score from function 3. This functional analysis

will result in the interests of each user u being represented as:





where n is the total number of concepts in the domain of user u and wc,, represents the

weight which is assigned to concept c for user u. Please see appendix C1-C5 for sample

input, word and weighted concept files for a user in our study.

The last step in this stage of the process is to develop a similar vector

representation for each advertisement. This was completed semi-manually. First, we

manually assign descriptive terms to each advertisement. Please see appendix B for a list

of the ads and their respective characteristic arrays. In a real world application, we

recommend that this task be completed by a marketing expert for each product or service.

Next, WordNet was used to develop a concept representation for each of the

advertisements. Finally, we assigned a relative importance weight to each concept for a

given advertisement. This weight will represent the relative importance of that concept in

describing the given advertised product or service. This process will result in each

advertisement being represented as:





where n is the total number of concepts in the domain of advertisement j and


wc, represents the weight which is assigned to concept c for advertisement j. Although

manual development of vector representations is not uncommon, and in some cases offers

improved accuracy, it is probably not the most efficient [22]. It works well for our










proj ect, but it may not be a feasible alternative in a large scale operation; therefore, one

extension to our work may be to attempt to automate this process for advertisements.

Step 5: Using a chosen similarity measure, evaluate each ad/user combination.

The goal of this model is to rate advertisements on their likelihood of being of interest to

a particular user. We estimate this series of likelihood based on the similarities of the

respective user and advertisement vector representations via the vector space model [24,

34]. Recall from section 3.2 that the vector space model estimates the similarity between

two n dimensional normalized vectors based on the size of the angle B which separates

them in n dimensional space. The measure B is calculated by taking the dot product of

the two vectors. In order for us to apply a similar technique, we first need to adapt our

advertisement vectors Ac, to include a term for each concept which is present in the

user' s domain space u. This is accomplished as follows:


-(,,, if concept c is present in user j's domain space u~o: iyeeri rrSdrr i'x o ,.,

where n is the number of concepts in the domain space u of the user. Given the two n

dimensional vectors Uc,, and Ac~,k, We calculate their similarity as follows:




sim(U,,,, Ac,k) _c=1

c=1 c=1

This similarity score, also called the retrieval status value (RSV) is calculated for each

ad/user combination and is used to rank the advertisements. An advertisement' s RSV

score is used as a proxy measure of its relevance for a particular user, the higher the

score, the greater is the presumed relevance.










Step 6: Serve the ads accordingly and measure the effectiveness of the model.

The last phase of this part of our proj ect is to evaluate the effectiveness of the model. We

created a corpus of advertisements consisting of 100 arbitrarily chosen ads (for a list of

the products and services which are represented by this corpus of advertisements please

see appendix B). From this corpus, each user was provided with a set of advertisements

and asked to rate, on a scale of 1 to 5, their level of interest in the respective product or

service, using the following scale:

Product/Service Ranking Scale
1 No Interest
2 Little interest

3 Moderate Interest

4 High Interest
5 Very High Interest


One subset of the advertisements which were served to a given user were chosen

randomly (20 ads), while the remaining ads were selected based on the similarity ranking

functions described above (the top 20% of ads for each weighting scheme were selected).

As expected, there was quite a bit of overlap in the ads which were selected by the

different weighting schemes. Any ad which appeared in more than one category was

only served once.

Hypothesis 1: The IR based ad selection method will be more effective than the
commonly used random model in selecting targeted ads from a given
advertisement corpus with respect to the level of interest to a given user.


As previously mentioned, in an effort to develop a good set of initial structural weighting

schemes, we manually analyzed the raw code from a sample of the user' s html

documents. A secondary result of this analysis is that we also developed a preliminary










opinion as to the relative importance of the different structural sections of the html

documents. We attempt to test this intuition in hypothesis #2.

Hypothesis 2: Within our model, the keyword section of the html documents
provides more information than the other structural sections.

Results of the experiments and tests of the hypotheses will be presented in detail in

Chapter 6. For reference purposes we have

5.3 Online Advertisement Scheduling

The second major challenge that faces publishers is the development of an

advertisement schedule. This is a difficult task that must be repeatedly performed by

each publisher. The most precious resource that a publisher has is their online

advertising space; therefore, they must make every attempt to use it as efficiently as

possible. Developing a good schedule is widely accepted as the most important task in

achieving the publisher's goal to maximize revenue. Although online advertisements

may take several different forms including rich media and popups, in this work we have

chosen to focus on the most common form, banner advertising. Banner advertising has

long been the staple of the online advertising industry and it is still vitally important. In

an attempt to make the best use of their advertising space, publishers proactively develop

ad schedules for their predefined planning horizon. This problem takes the form of a

constrained optimization problem. Although every publisher is faced with a similar

problem, the relative model complexity can vary substantially depending on which

pricing model is chosen, and whether or not any other efforts are employed. We propose

three extensions of the basic Maxspace model. The new models extend the basic

Maxspace model to include efforts that are very common in industry. Solution

techniques for solving each of these extensions are proposed and tested.









5.3.1 The Modified Maxspace Problem (MMS)

The most basic situation with respect to online advertisement scheduling involves a

pure CPM pricing model and no additional incorporation of intelligence by the publisher.

This problem was introduced into academic literature in 2002 by Adler, Gibbons and

Matias in their seminal online advertising paper [6]. They named it the Maxspace

problem because the primary goal of the publisher is to schedule ads in a manner which

uses the maximum amount of available advertising space. The first model that we

propose is a slight variation of this basic Maxspace problem. Unlike the basic Maxspace

problem which has a hard frequency constraint requiring that an ad be served exactly wl

times within the planning horizon if it is selected, this problem instead allows for an

acceptable frequency range for each advertisement which is very common in practice.

The frequency bounds provide needed flexibility to the publishers.

An IP formulation of this problem is as follows:


Mis ax s,x,i
]=1 =1-
st.
Constraints

(1) i s, xlS, < ,j= 1, 2,.., N

(2) L -M(-y)< x U M ( ( y) i =1, 2,...n


(3) -My < x i My;, i =1, 2,...n

1I if Ad i is assigned to ad slot jl
(4) x~ = t i

(5) y~ 1 if ad i is assignedb~~~'i"'










W here:
n number of advertisem ents
N number of Bann er/Time Slots
S Banner height
sl- height of advertisem ent i, i = 1,2,...,n
LI low er lim it on the frequency of ad i, i = 1, 2,..., n
U, upper limit on the frequency of ad i, i = 1,2,...,n
M~ large penalty greater than the number of Banner/Time slots


The first constraint ensures that the combined height of the set of ads which are

scheduled for each banner slot does not exceed the available space. An assumption of the

model is that if an advertisement is chosen, the number of delivered impressions for that

ad must fall between a predefined upper and lower bound. Constraints #2 and #3

combine to ensure that this relationship is guaranteed. They assure that if an ad is not

served it will be bounded by 0 in both directions, and if an ad is served, its frequency

must be between the lower and upper bounds. If an ad is served, constraint #2 dominates

constraint #3 and the frequency is thus constrained by the lower and upper bounds. If an

ad is not served, constraint #3 dominates constraint #2, which forces the frequency to 0.

Thus, these constraints prevent any number of impressions that is not either 0 or between

the bounds. We acknowledge that this represents a slight variation from the model

solved in Adler et al. [6], Kumar et al. [5, 13], Freund et al. [9]. In the model presented


in those papers, constraints #2 and #3 are presented as i xx, = *,7, i = 1, 2,.., n which


assures that the ad is served exactly the prescribed number of times over the planning

horizon. Another slight variant is proposed by Amiri and Menon [10]. In their proposed

formulation, constraint #2 and #3 are replaced by wlyl <: U, y,, i = 1, 2,...n, providing an










upper bound on the number of times the ad is served. Within the industry it is very

common for an advertisement to have an upper and a lower bound on the frequency;

therefore, we have adapted our formulation accordingly. We could have alternatively

used a "fixed charge" approach to link C xil anrd y of the form:




Noeeti ol eestt nldn pnlyo ,i h betv ucint






Hoyow evehi woud cnepesiate include ivng apenalty dfnton y, i the obectivefuion vrals to





same ad cannot be displayed multiple times in a given banner slot. This model is based

on the assumption that the revenue generated increases linearly in the volume of the ad.

This assumption will be relaxed in our last model.

Many publishers still follow a variant of this basic model, making the Maxspace

problem very popular in the research literature. Adler et al. [6] prove that the ad

scheduling problem is NP-hard. Therefore, it is highly unlikely that it can be solved by

an efficient optimization algorithm [50]. As noted above, many researchers, including

Freund and Naor [9], Amiri and Menon [10], and Kumar et al. [5, 13], have proposed

approximation solution techniques for the Maxspace problem, although none have solved

the variation presented above. We will extend this line of research by proposing and

testing several heuristic and metaheuristic approaches for the proposed variationu of the

Maxspace problem. In addition, we also propose two additional models which extend









this basic model, but are quite a bit more demanding in terms of computational

compl exity.

5.3.2 The Modified Maxspace Problem with Ad Targeting (MMSwAT)

Given the industry migration towards performance based pricing models, ad

targeting has become a focal point of discussion, and is considered by many to be the

most important effort in online advertising [121]. Ad targeting is an industry term used

to describe any effort to deliver an advertisement to a subset of the advertisement time

slots based on an estimation of the likelihood that the user who is viewing those ad slots

will act on it. In targeting, it would be ideal for a publisher to have an accurate

probability matrix indicating the probability for each of their time slots that a given user

would perform the action of interest (clicking, for example) for each of their

advertisements. This would allow the publisher to be very precise in their targeting of

each individual ad; however, as a result of privacy laws and the large number of time

slots, advertisements, and users, development and maintenance of such a matrix is

considered, in most cases, to be impractical if not impossible. Instead, advertisements are

commonly targeted to a subset or cluster of the publisher' s time slot population. These

clusters are chosen in an effort to maximize inclusion of the advertisers target audience.

For example, one very popular method is to cluster based on geographic regions. A

company's target audience is often geographically concentrated in certain areas. When

this is the case, it is logical to target that company's ads to users/time slots which are

located in those local regions. This technique has recently grown substantially in

popularity. The current projected year-over-year revenue growth rate for local online

advertising is approximately 50%, which is more than twice the proj ected growth rate for

online advertisement spending as a whole [122]. Other popular methods of time slot









clustering include clustering based on a Web site's page content, the time of day, the day

of week, a user's bandwidth, Nielsen's DMA regions, etc. Publishers often use these

methods simultaneously in an effort to improve performance. Earlier in this work, we

discussed our proposed efforts to provide another good alternative clustering method

based on the application of IR techniques (see section 5.2).

As the first of our two extensions to our modified Maxspace model, we model the

scenario where a publisher has chosen a hybrid pricing model and is therefore employing

some type of advertisement targeting. Incorporation of the hybrid pricing model is very

important because it is very popular in industry; however, it has thus far received very

little attention in the academic research literature. We do not distinguish between the

different methods of targeting; instead our model is focused only on step two of this

process, assuming that the targeting methods) has been previously chosen and that the

users/time slots have been clustered accordingly. We acknowledge that the proposed

model may require slight adjustments depending on which of the targeting methods are

chosen in step 1. However, we have attempted to make the model as general as possible

in an effort to increase its scope of applicability. We now provide a basic description of

the model extension.

The maj or difference in this model is the incorporation of clusters of time slots

which would be based on the chosen targeting efforts) from the first stage of the

problem. As an example, if content targeting was the only chosen targeting method, the

clusters would be formed based on the content of the Web page (i.e. one cluster for time

slots on the sports page, one cluster for slots on the music page, one cluster for slots on

the news page, etc). These designations would obviously be different for other methods









of targeting. Given that one cluster is the full ad set which includes all of the time slots,

each time slot must be assigned to at least one cluster; however, each slot could also be

assigned to other clusters. Likewise, each advertisement must be targeted to at least one

cluster, but may be targeted to multiple clusters. A two dimensional input matrix C, is

used to manage the cluster assignments for each ad/time slot combination. The input

elements of C, equal 1 if ad i and time slot j have at least one cluster in common. For

example, assume that time slot j represents a banner on the sports page and therefore has

been designated as being part of the sports cluster. Likewise, assume that ad i is an

advertisement for tennis rackets and that consequently the advertiser and publisher have

decided to target it to time slots in the sports cluster. In this case, since ad i and time slot

j have the sports cluster in common, the C, entry for this ad/time slot combination

would equal 1. Had this ad not been targeted to the sports cluster, this C, matrix entry

would have been 0 unless they had another cluster in common (which would be the case

if this slot was, for example, also assigned to the "outdoors" cluster and the decision was

made to target tennis racket ads to the outdoors cluster). The IP formulation of the

problem is as follows:










Max ~ ;sI xv
]=1 =1-
st.
Constraints

(1) i[ s, xx, < S, j = 1,2,..,N


(2) L1 M (1 -y,) < xli <:U, + M(1 ,i =1, 2,...n2


(3) -My_ <~ h x, <;My,, i =1,2,...n

( 4) xv 2 C ,V i

(5) x~ = 11 if Ad i is assigned to ad slot jO her s


(6) y, r1 if ad i is ass ignedothe w s

W here:
n number of advertisem ents
N number of Bann er/Time Slots
S Banner height
sl- height of advertisem ent i, i = 1,2,...,n
LI lower limit on the frequency of ad i, i = 1,2,...,n
U, upper limit on the frequency of ad i, i = 1,2,...,n
M~ large penalty greater than the number of Banner/Time slots

C, equals 1 if ad i and tim e slot j have at least one cluster in
com mon an d 0 oth erw ise, V i, j


The first constraint ensures that the cumulative area consumed by the ads assigned to any

given slot is within the assigned space limitation S of that slot. The second and third

constraints ensure that, if an ad is chosen, the number of delivered impressions of that ad

falls within the contracted upper and lower limits. The new constraint, constraint #4,

ensures that each ad, if served, is only served to time slots which belong to clusters to









which the ad has been targeted. Constraints #5 assures that at most one copy of each ad

can appear in any given slot.

5.3.3 The Modified Maxspace Problem with Non-Linear Pricing (MMSwNLP)

For our last extension, we extend the Modified Maxspace Problem to also include a

non-linear pricing function.

Prior modeling efforts have assumed that advertisers are charged a constant rate per

unit volume of their advertising regardless of their level of commitment. This is easily

represented in the model formulation; however, in many cases, it does not accurately

reflect the pricing behavior seen in industry. In an effort to entice advertisers to commit

to a larger volume of advertising, instead of using a constant pricing structure, publishers

often offer a series of price breaks. From a publisher' s standpoint, the obvious goal is to

increase overall revenue by improving the efficiency of ad space utilization. These price

breaks are commonly implemented via a step pricing function, with the overall

continuum of volume commitments being subdivided into a series of ranges, each range

having its own price per unit volume. The size of the ranges and the relative prices per

unit volume will obviously differ from publisher to publisher.

We extend our previous model to allow for these additional complexities and to

provide alternative solution methods for the resulting problem. We now provide the

associated IP formulation for this problem.










Max~ f,(i sex,)
]=1 =1-
st.
Constraints

(1) i[ s,xx, < S', j =1, 2,.., N

(2) L, 1-y)<[ x,< M (1 (1 y,) i =1, 2,...n


(3) -y< x
1 if~ Ad i is assignzed to a~d slot j~
(4) x~ r s

1I if ad i is assignzedl
(5) y, = her s

W here:
n number of advertisem ents
N number of Bann er/Time Slots
S Banner height
sl- height of advertisem ent i, i = 1, 2,..., n
L, lower limit on the frequency of ad i, i = 1,2,...,n
U, upper limit on the frequency of ad i, i = 1, 2,..., n
M~ large penalty greater than the number of Banner/Time slots
f ( ) th e n on -lin ear step fun action of p rice p er unit volum e, V i

The first constraint ensures that the cumulative area consumed by the ads assigned to any

given slot is within the assigned space limitation S of that slot. The second and third

constraints ensure that, if an ad is chosen, the number of delivered impressions of that ad

falls within the contracted upper and lower limits. Constraints #4 assures that at most one

copy of each ad can appear in any given slot.

One subtlety of the last two models bears explanation. We made the claim that the

last two models were designed to help publishers improve their performance under a

hybrid pricing model; however, neither of the models specifically optimizes over any of









the mentioned performance based measures. The explanation lies in the fact that

although the formulation indicates that we are only optimizing over the amount of space

which is used, similar to the obj ective function of the base Maxspace problem, efforts by

the publishers to improve their performance with respect to the performance based

measures is implicitly included by the first stage targeting and nonlinear pricing

strategies. This is the common practice in industry.

5.4 Model Solution Approaches

The three online ad scheduling problems which have been introduced are NP-hard

combinatorial optimization problems that are addressed on a daily basis by many online

advertising publishers. Their NP-hard nature makes it highly unlikely that they will ever

be solved by an efficient optimal algorithm. Therefore, efficient and effective

approximation algorithms are necessary. In an effort to further the search for such

algorithms, we propose and test several heuristic and metaheuristic approaches for each

problem. Our metaheuristic approaches will be based on very popular genetic algorithm

and neural network methodology.

5.4.1 Augmented Neural Network (AugNN)

We have chosen a very promising variation of the traditional neural network

architecture, the Augmented Neural Network (AugNN), which was first introduced by

Agarwal [106] in 2003. The AugNN approach augments the traditional neural network

model to allow for the embedding of domain and problem-specific knowledge via a base

heuristic. This approach takes advantage of both the heuristic approach and the iterative

local-search approach. AugNN utilizes a proven base heuristic to exploit the problem

specific structure and then iteratively searches the local neighborhood randomly in an



































After at most n iterations, or an epoch k, a feasible solution is generated. A series

of relative weights, one of which is associated with each advertisement i, are modified

after each epoch in an attempt to produce improved neighborhood solutions in future

epochs. If improvements are found from epoch to epoch, the weights may be reinforced.

In addition, if an improved solution is not identified within a predetermined number of

epochs, the algorithm has the ability to backtrack. As a result of its successful use of

domain specific knowledge, this technique seems to avoid being trapped by extremely

inefficient local optima, which has plagued many other neural network based techniques.

In addition, because it does not involve any relaxations or LP solutions upon which many

other alternative techniques are dependent, it is a relatively simple technique in terms of

computational complexity. In order to apply this technique, we need a good base

heuristic for each of the three problems. After testing several alternatives, we are


effort to improve upon the initial solution. The chosen base heuristic provides the

algorithm with a starting solution in a neighborhood in the search space.

A ugNN Notation
RF : Reinforcement factor

BF : Backtracking factor

a : Learning rate coefficient

k : Epoch number

0, : Weight associated with ad i, i EA
E : Error or difference between the current ad schedule value and the -


bound in epoch k (upper bound = total quantity of available ad space)
: Stop Factor
: Learning rate multiplicative factor
: Number of backtracks allowed


upper


SF
MF ;
NBA









employing a largest volume least full (LVLF) heuristic for the Modified Maxspace and

the Modified Maxspace with Non-Linear Pricing problems and a Subset-LVLF heuristic

for the Modified Maxspace with Ad Targeting problem. Both heuristics are described

below. The LVLF heuristic is very similar to the LSLF heuristic which was introduced

by Adler, et al. [6] with the only difference that the ads are sorted based on volume

instead of size. We test the heuristics both independently and in combination with the

AugNN procedure for each problem.

Largest Volume Least Full (LVLF) Algest within
* Sort the ads in descending order of volume utilizing the upper frequency bound for
the volume calculation of each ad.

* Assign each of the ads in sorted order. If feasible, assign ad i to the least full slots
one at a time until either we reach a time slot which has insufficient capacity to
accept ad i or the upper frequency bound for ad i, U, is reached.

Subset Largest Size Least Full Algest within
* Classify the ads into two subsets based on their target id. Some of the ads will be
targeted to a specific set of time slots while others are untargeted and can be served
to any available slot. If the ad is targeted, it is placed in subset D, otherwise, it is
placed in subset D,,.

* Sort the ads in descending order of volume utilizing the upper frequency bound for
the calculation.

* Utilizing the LVLF algorithm, assign the ads from subset D, and then from subset
D,, as long as there is sufficient space available.

The method by which we modify the augmented neural network weights is

determined by the learning strategy. The learning strategy consists of the weight

modification formula and any additional methods which are chosen. We employ the

following learning strategy:

a. Weight modification formula

m,~(k+1)= mi,(k) a *s, s(k), V~i EA









b. Additional methods

In addition, we employ reinforcement and backtracking mechanisms, as
detailed in the algorithm below, to improve the solution quality.

Our strategy is predicated on the theory that if the error in an epoch is too high, the order

in which the ads are selected for assignment during the following epoch should be

changed more than if the error was less.

Aug4NN Altihi


Step 1: Initialize RF, BF, SF, NBA, a, cow, e and k

Step 2: Run LVLF or Subset-LVLF (for the Modified Maxspace with Ad Targeting
problem)
Step 3: Set t = 1, x = 1, z = 0, and y = 1

Step 4: Evaluate the fitness of the initial solution based on the objective function of the
respective problem.

Step 5: Modify the AugNN weights via the weight modification formula discussed in a
above.

Step 6: Run LVLF or Subset-LVLF (for the Modified Maxspace with Ad Targeting
problem) and set x = x+1.

Step7: Evaluate the fitness of the new ad schedule and check its uniqueness. If it is
unique, set t = t+1.

Step 8: If t > (number or desired unique solutions) or x > (SF number of desired
unique solutions), terminate and report the best solution so far.
Step 9: If the fitness of the current ad schedule > the best solution so far, reinforce the
current set of AugNN weights by replicating the last set of weight modifications
RF. Set y = 1 and return to step 6.

Step 10: If y < BF, modify the AugNN weights via the weight modification formula
discussed in a above, set y = y+1 and return to step 6.

Step 11: If, y > BF and z < NBA, set y = 1, modify the AugNN weights by resetting
them to the best set of weights thus far, set z = z+1 and return to step 6.

Step 12: If, y > BF and z > NBA, set z = 0, set a = a xM~F, modify the AugNN










weights by resetting them to the best set of weights thus far, set z = z+1 and
return to step 6.

5.4.2 Genetic Algorithm (GA)

We also employ a genetic algorithm (GA) based algorithm. For the three proposed

problems, each GA chromosome, which can be visualized as a 1 x n vector as depicted

below, represents a candidate sequence of n advertisements A = (a,,a ,...,an } .


a2 a4 a1 a5 a3


The advertisements are served in the order in which they appear in the respective

chromosome. For example, given the basic five ad chromosome string depicted above,

the GA would first attempt to serve ad 2, then ad 4, then ad 1, etc. When attempting to

serve a given ad, if there are not at least L, (lower frequency bound for ad i) time slots

with sufficient capacity to accommodate the ad, it is not served at all. Those ads which

do meet this feasibility requirement are served until either their upper frequency bound is

reached or an attempt has been made to place the respective ad in each time slot. The

associated fitness value is measured based on the obj ective function of the given problem.

The three primary operations of a simple genetic algorithm are reproduction, mutation

and crossover. We employ a roulette wheel reproduction method, a one-point crossover

and a basic ad swap mutation operator.



GA Notation

e : elite percentage

p,, : probability of mutation

ps : population size

NU : number of desired unique solutions









CL : crossover attempt limit


The GA begins with an initial population of strings which are all created randomly

with the exception of one string which is created using the LVLF or Subset-LVLF

heuristic (depending on which problem is being solved). Between generations we use the

elite percentage (e ) to determine how many of the most fit strings will survive

unchanged into the next generation. The roulette wheel reproduction operator selects

potential reproductive parental strings based on their relative fitness values. Each string

has a probability of selection which is directly proportional to the ratio of its fitness value

divided by the sum of the fitness values of the entire population. The most fit strings are

thereby given the highest probability of selection. Given the binary nature of ad selection

in all three of the proposed ad scheduling problems, any ad duplication within a proposed

solution string causes it to be infeasible. As a result, common GA selection and

crossover mechanisms struggle to achieve an acceptable level of feasibility for these

problems. To overcome this challenge, we use a crossover mechanism developed by

Kumar et al. [67] which insures the feasibility of each new offspring. Having selected

two parent strings via the roulette wheel process described above, a single crossover point

is randomly selected. In the example depicted in Figure #8, point number five, which

falls between ads five and six, was selected. Based on the chosen crossover point and the

genetic material of the parents, two children strings are created. The genetic material on

the left side of the crossover point in parent 1 is then directly inherited by child 1 and

similarly for parent 2 and child 2. In our example (see Figure #9), the first set of ads

which are inherited by child one are ads a7, a4, a11, a5 and as. Up to this point, the

proposed crossover method has followed the basic single point crossover process;










however, the remainder of the process is somewhat different. Unlike the traditional

mechanism, the second half of the genetic material which makes up the chromosome

string of child one is not directly inherited from parent two. Instead, the ads which make

up the second half of child one' s string are inherited from the second half of parent one

with the caveat being that they are re-ordered based on how they appear in parent two. A

similar process is followed for child two. In our basic example, the ads which make up

the second half of child one are ads a9, a10, a3, a6, a1 and a2, but they are reordered

based on how they appear in parent two (ie. a2, a1, a10, a9, a3 and a6). This

reproduction process has created two new offspring for the next generation. However,

before being added into the next population, the new offspring are given an opportunity

to mutate based on the pre-defined probability of mutation operator (pm). A string which

is selected for mutation will have two randomly selected ads swap places within the

string. In the example below (see Figures #10 and #11), it is assumed that the second

child has been selected for mutation and ads as and all have been randomly selected as

mutation candidates. This entire process in repeated from generation to generation until a

predefined number of unique solutions have been created or the crossover attempt limit

has been exceeded.






Parent 1 87 84 a11 a5 a8 a9 a10 a3 a6 a1 a2

Crossover Point

Parent 2 a2 a8 a1 a5 a10 a3 a9 a11 a6 a4 a,


Figure 8: Selected Parents Prior to Crossover












Child 1 87 84 a11 a5 a8 a2 a1 a10 a9 a3 a6

Child 2 a2 as a, as a10 a7 a4 a,, a9 a3 a6
Figure 9: Resulting Offspring



SRandomly Seleted AdsI

a2 a11 a1 a5 a10 a7 a4 a8 a9 a3 a6
Figure 10: Child 2 Prior to Mutation



a2 a11 a1 a5 a10 a7 a4 a8 a9 a3 a6
Figure 11: Child 2 After Mutation


GA -Algorithm

Step 1: Initialize
Step 2: Apply LVLF or Subset-LVLF (for the Modified Maxspace with Ad Targeting
problem) and insert the resulting solution as the first string in the initial GA
population
Step 3: Fill the initial population by creating (ps 1) random chromosomes
Step 4: Set t = 1 and c = 0.
Step 5: For each string, attempt to assign each of the ads in the order in which it appears
in the string. If feasible, assign ad i to the least full slots one at a time until
we either reach a time slot which has insufficient capacity to accept ad i or the

upper frequency bound for ad i, U, is reached. Evaluate the fitness of each

string based on the objective function of the respective problem. Check each
chromosome for uniqueness. For each unique string, set t = t+1

Step 6: Sort the strings in descending order of their relative fitness values
Step 7: Populate the elite list by selecting the best (e ps ) strings based on their
relative fitness values. These strings are added to the next population.

Step 8: Utilizing the roulette wheel selection method, select two parent strings for










reproduction and cross them over. Set c = c+1.
Step 9: Mutate the resulting children based on the mutation probability.
Step 10: Add the children strings to the next population ad test them for uniqueness.
For each child that is unique, set t = t+1.

Step 1 1: If t > NU or c > CL, calculate the fitness value for those strings in the new
population and terminate reporting the best solution so far.
Step 12: If the number of chromosomes in the next population > ps goto step 5;
otherwise, goto step step 8.

5.4.3 Hybrid Technique

Both the AugNN and GA methods are expected to perform reasonably well on the

three proposed problems; however, in some cases plural methods can be employed which

leverage the best aspects of multiple techniques. Based on this intuition, our final

solution approach for the three proposed problems will be a new hybrid technique which

combines the AugNN and GA methods. The hybrid method employs the AugNN method

in an effort to search the best local neighborhood which has been discovered after each

generation of the genetic algorithm. If the AugNN is able is able to find an improved

solution, the respective solution is then fed into the next population of the GA; otherwise,

the GA proceeds normally. This process repeats until the desired number of unique

solutions have been found.

Hybrid Notation
NUA : number of desired unique AugNN solutions
NU : number of desired unique solutions (total)



Step 1: Run one generation of the genetic algorithm as described.
Step 2: Develop a set of AugNN weights which replicates the best ad schedule
discovered by the GA










Step 3: Run the AugNN until the number of unique AugNN solutions > NUA.
Step 4: If the AugNN process improves upon the current best solutions, feed the
associated ad schedule as a string into the next population of the GA.

Step 5: Repeat steps 1-4 until the number of unique solutions > NU.

5.4.4 Parameter Selection

As discussed by Aytug et al. [66], one of the biggest concerns with respect to the

application of these black box type algorithms, such as neural networks and genetic

algorithms, is the absence of theoretical guidance with respect to the methods by which

the parameter settings should be selected. The techniques are very powerful and

extremely popular, but their effectiveness may vary considerably based on the ability to

find a good set of parameter settings for the numerous algorithmic parameters and

settings for each technique. For the GA based methods, these include population size,

mutation probability and elite list percentage. For the AugNN based methods, these

include the learning rate, backtracking factor, reinforcement factor, learning rate

multiplicative factor and the number of backtracks allowed. For a more detailed

explanation of each of these parameters, please see appendix A.

In developing a method of parameter selection, researchers are often enticed to

utilize the widely criticized practice of parameter tuning. In doing so, they tune the

parameters to different settings for each problem set. This technique may provide

improved results, but is often impractical in industry and also brings into question any

assumptions that are made concerning the generalizability of the technique. We avoid

this practice. Alternatively, in an effort to gain a better understanding of the robustness of

each of the proposed techniques for the problems introduced, we maintain a consistent set

of parameter settings across all of the problem sets for each of the three problem










instances. In determining which parameter setting to use for each problem instance, we

use prior applications of the techniques to provide guidance, and then select a good set of

parameter settings for our techniques based on a series of pilot runs. For each of the three

problem instances, our pilot runs consisted of 54 problems, 2 problems arbitrarily

selected from each size of problems as described in the next section. The final parameter

sets which were utilized for the proj ect are described in Tables 2-4.

Table 2: AugNN Parameter Values
Problem Unique Sol LR BF NBA RF SF
MMS 300 0.003 8 7 3 10
MMSwAT 300 0.001 8 5 2 10
MMSwNLP 300 0.05 8 5 2 10


Table 3: GA Parameter Values
Problem Uniq Soln PS Mut Prob Elit % Cross Limit
MMS 300 80 0.05 0.25 400
MMSwAT 300 80 0.05 0.35 400
MMSwNLP 300 80 0.1 0.1 400


Table 4: Hybrid Parameter Values
Uniq GA Mut AugNN AugNN Uniq
Problem Soln PS Prob Elit % Cross Lim LR Soln SF
MMS 300 40 0.05 0.25 400 0.01 150 5
MMSwAT 300 40 0.05 0.35 400 0.005 150 5
MMSwNL 300 40 0.1 0.1 400 0.05 150 5


5.5 Problem set Development

For each of the three problems, we needed a good set of test problems. We wanted

to develop a strong set of problems which would give us, and researchers that follow, the

opportunity to evaluate the relative effectiveness and scalability of proposed solution

methodologies. To achieve this goal, for each of the three problems, we created 27

problem sets of different sizes and difficulties. Each problem set, which consists of 10









individual problems, has a predetermined number of time slots N which are of a

predetermined size S. If we follow the common precedence of prior researchers and

assume that the ads are flipped once per minute, the planning horizon covered by our test

problems ranges from a half of an hour to an entire day. Prior work on the Maxspace

problem had limited the planning horizon to 100 minutes. We chose to expand this

horizon in an effort to appeal to a larger set of publishers. The size s, and frequency

bounds U, and L, of ad Az in any test problem are randomly generated and have values

which vary uniformly between S /3 and 2S/3, where S is the size of the time slots for that

particular problem. In their work on the Maxspace problem, Kumar et al. [6] discovered

that the utilization of this method for ad sizes fosters the creation of more difficult

problems; therefore, we also employ it in our work. For the Modified Maxspace with Ad

Targeting problem set, each time slot is assigned a target id between 1 and 3. Similarly

each advertisement is also randomly assigned a target id between 1 and 4. An

advertisement which is assigned a target id of four is considered to be an untargeted ad

which can be served in any available time slot. All of the other ads are targeted and can

only be served to time slots which match their target id.















CHAPTER 6
RESULTS

In this chapter we provide the results of our empirical tests for both the IR Based Ad

Targeting and the Online Advertisement Scheduling sections of the proj ect.

6.1 Information Retrieval Based Ad Targeting Results

In this section, we report the results of our information retrieval based ad targeting

experiments. Each user was asked to rank their level of interest on a scale of 1 to 5 for a set

of ads, some of which were selected randomly and the remainder of which was selected

based on one of the three weighting schemes. As discussed in section 5.2, within the

framework of our ad targeting process, we tested three different weighting schemes in an

effort to identify the best html structural element weighting combination. The tested

schemes are detailed in Table 1, section 5.2.

The relative effectiveness of each of the advertisement selection methods was

determined based on the mean score of the user rankings for the associated set of ads. Since

the underlying structure of the ranking scale is such that a higher score indicates an

increased level of interest, we are assuming that a method which selects a group of ads

which have a higher mean user ranking score is more effective than the alternative.

Throughout this section, we use the unpaired T test to evaluate the statistical significance of

the difference in means of user rankings between sets of selected ads. An important

assumption of the T test is that the dependent variable is normally distributed. In our

analysis the student rankings represents the dependent variable and based on the results of

the Q-Q plot, which is given in Figure 12, we are relatively confident that this assumption






87


has been met. In addition, utilization of the T test requires a careful analysis of the

respective variances of the compared data sets. We utilized Levene' s test of equal variances

for this part of the analysis. If the significance level of the Levene' s test is greater than or

equal to .05, the 'equal variances assumed' row of the table is applicable; otherwise, the

'equal variances not assumed' row must be used to determine the significance of the

associated T test.


Normal Q-Q Plot of Student Ratings





3O











iL.j 1
0 1 2 3 4 5 6

Observed Value

Figurel12: Q-Q Plot of Student Response Values

We first compare the effectiveness of the proposed IR based targeting method with a

random selection process. The output of the IR Based Ad Targeting model is a set of

weights/scores (please see section 5.2 for a more detailed discussion of how the

weights/scores are assigned), one of which is assigned to each advertisement in the corpus.

Based on the model design, a higher weight implies a greater fit between the given product

and the interests of the respective user; therefore, the ads are served to a given user in










descending order of their weight/score. We acknowledge that depending on the size of a

publisher' s ad corpus and the length of surfing time for a particular user, the percentage of

ads which may be served to a user will vary; however, for this part or our experiment we

assume that each user is served exactly 20% of our advertisement corpus which consists of

100 ads. Based on this assumption, the student rankings for the top 20 ads selected by each

of the IR methods (one set for each weighting scheme) are compared with the rankings for

20 randomly selected ads. Table 5 below provides a summary of the mean student rank

values for each of the ad selection methods. The detailed T test results for this analysis can

be found in Tables 6-8.

Table 5: Summary of Mean Student Rankings for the 4 Selection Methods
Ad Selection Method Mean Student Rankin
IR Based Method with Weighting Scheme 2 2.71
IR Based Method with Weighting Scheme 1 2.69
IR Based Method with Weighting Scheme 3 2.63
Random Selection Method 2.50




















Independent Sampe Test
Levene's TestResults t-test for Equality of Means
Sig. (2- Mean 95% Conf
F Sig. t df tailed) Diff Std. Err I nte rval
Lower Upr
Equal var assumed 0.427 0.51 2.954 1794 0.003 ** 0.19 0.065 0.065 0.319
Equal var not assumed 2.957 1782 0.003 0.19 0.065 0.065 0.319



Table 7: T Test-Scheme 2 & Random Selection
Group Statistics
N Mean Std. Dev Std. Error
Scheme 2 942 2.71 1.394 0.045
Random 853 2.50 1.361 0.047


Independent Samples Test
Levene's TestResults t-test for Equality of Means
Sig. (2- Mean Std. Error 95% Conf
F Sig. t df tailed) Diff Diff Interval
Lower Upper
Equal var assumed 0.466 0.50 3.252 1793 0.001 ** 0.21 0.065 0.084 0.340

Eual var not assumed 3.255 1783 0.001 0.21 0.065 0.084 0.339


Table 6: T Test-Scheme 1 & Random Selection
Group Statistics
N Mean Std. Dev Std. Error
Scheme 1 943 2.69 1.388 0.045
Random 853 2.50 1.361 0.047











Table 8: T Test-Scheme 3 & Random Selection
Group Statistics
Std.
N Mean Std. Dev Error
Scheme 3 938 2.63 1.381 0.045
Random 853 2.50 1.361 0.047


Independent Samples Test
Levene's Test Results t-test for Equality of Means
Sig. (2- Mean
F Sig. t df tailed) Diff Std. Error Diff



Equal var assumed 0.272 0.60 2.062 1789 0.039 **1 0.13 0.065
Equal var not assumed 2.064 1777 0.039 0.13 0.065