UFDC Home  myUFDC Home  Help 



Full Text  
RURAL ROAD FEATURE EXTRACTION FROM AERIAL IMAGES USING ANISOTROPIC DIFFUSION AND DYNAMIC SNAKES By VIJAYARAGHAVAN SIVARAMAN A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORIDA 2004 Copyright 2004 By Vijayaraghavan Sivaraman ACKNOWLEDGMENTS I sincerely thank Dr. Bon A. Dewitt for his continuous support and encouragement throughout the course of this research. He provided much needed technical help and constructive criticism by taking time out of his busy schedule. Dr Michael C. Nechyba for getting me started with the right background to do research in the field of image processing. I would like to thank Dr Grenville Barnes and Dr Dave Gibson for their patience and support, and their invaluable contribution in supervising my research. Finally, I wish to express love and respect for my parents, family and friends. They are always with me. TABLE OF CONTENTS Page A C K N O W L E D G M E N T S ................................................................................................. iii LIST OF TABLES ................................................... vii LIST OF FIGURES ........................................... ............................ viii ABSTRACT .............................................................................. x CHAPTER 1 IN TR O D U C T IO N ........ .. ......................................... ..........................................1. 1.1 RoadFeature Extraction Objectives and Constraints..................................1... 1.2 Feature Extraction from a Geomatics Perspective.......................................3... 2 BACKGROUND .............................. ............ .............................4 2.1 R oad C characteristics .................. ............................................................5...... 2 .1.1 G eom etric......................................................................................... .. .8 2.1.2 R adiom etric ... ................................................................................. 8 2.1.3 Topologic ................ ........ .............................. .9 2 .1.4 F unctional ...................................................................................... 10 2 .1.5 C ontextual ...................................................................................... 10 2.2 Im ageProcessing Techniques .................................................... ................ 11 2.2.1 Low Level Processing ....................... ............................................ 12 2.2.2 M edium Level Processing ................................................... 15 2.2.3 H ighL evel Processing ....................................................... ............... 19 2.3 Approaches to Road Feature Extraction ..................................... ................ 22 2.3.1 Road Extraction Algorithm Using a PathFollowing Approach ..........25 2.3.2 MultiScale and Snakes RoadFeature Extraction................................32 2.3.2.1 M odule I .................................................................................. 36 2.3.2.2 M odule II .................................................................. ... .......... ..41 2.3.2.3 Module III ...... ................. .................................... 45 3 ANISOTROPIC DIFFUSION AND THE PERONAMALIK ALGORITHM ........ 51 3.1 Principles of Isotropic and Anisotropic Diffusion ................ ..................... 52 3.2 PeronaMalik Algorithm for Road Extraction ...........................................57 3.2 .1 Intra R egion B lurring ......................................................... ................ 58 3.2.2 L ocal E dge Enhancem ent .................................................. ................ 61 3.3 Anisotropic D iffusion Im plem entation....................................... ................ 62 4 SNAKES: THEORY AND IMPLEMENTATION...............................................67 4 .1 T h e o ry .............................................................................................................. 6 9 4.1.1 Internal Energy................................................................................. 74 4.1.2 External E energy ....................................... .. .......... ............ ............ 75 4.1.3 Im age (Potential Energy) ................................................... 76 4.1.3.1 Im agefunctional (E ine) ............................................ ................ 77 4.1.3.2 E dgefunctional (E edge) ............................................. ................ 77 4.1.3.3 Term functional (Eterm).......................................... ..................... 78 4.2.1 Dynamic Programming for Snake Energy Minimization .......................80 4.2.2 D ynam ic Program m ing ...................................................... .................. 81 4.2.3 Dynam ic Snake Implem entation ................ .................................... 85 5 M E TH O D O F EX TR A C TIO N ...................................................................................88 5.1 T technique Selection ....................................... .. ..................... ..... .......... 89 5.2 Extraction M ethod ................... ................ 98 5.2.1 Selection of Road Segments ............... .......................102 5.2.2 Im age Diffusion ............................................ ............... 103 5.2.3 Interpolation of Road Segm ents..................................... .................. 104 5.2.4 Diffused Road Segment Subset and Road Point Transformation......... 105 5.2.5 Snake Implementation and Transformation of Extracted Road......... 106 5.3 E valuation M ethod ...................................... ........................ ............... 108 5.3.1 G goodness of Fit ..................................... .. ........ .......... .. .. ........ ..... 109 5 .3 .2 F T e st .................................................................................................... 1 10 6 RE SU LT A N D AN A LY SIS.................................... ...................... ...............1...... 12 6 .1 R e su lts ............................................................................................................ 1 12 6.2 Analysis of Result on Test Images....................................... 113 7 CONCLUSION AND FUTURE WORK .......... .......................118 7.1 Conclusion .......... .... .... .. ................................ 118 7.2 Future W ork .............................................................................................119 APPENDIX A MATLAB CODE FOR ROAD FEATURE EXTRACTION.................................121 B PROFILE MATCHING AND KALMAN FILTER FOR ROAD EXTRACTION. 126 LIST O F R EFEREN CE S .. .................................................................... ............... 132 BIOGRAPH ICAL SKETCH .................. .............................................................. 134 LIST OF TABLES Table Page 21 Image pixel subset ...................... ........... ............................. 12 22 C onvolution kernel .... ....................................................................... .............. 12 23 M ethods of extraction.. ..................................................................... ................ 23 24 M odule of extraction .. ...................................................................... ................ 35 4 1 P rop o sales .............. ....................................................... ............................... . 8 1 42 Stage 1 com putation .. .................................................................... ................ 82 43 Proposal revenue com bination ......................................................... 83 44 Stage 2 com putation .... ................................................................... ............... 84 51 Stages of develop ent .................................................................... ................ 89 61 Summary of evaluation for extracted road features .................... ...................116 LIST OF FIGURES Figure Page 21 R oad characteristics........................................... ............................................7....... 2 2 G au ssian k ern el .................................................. .............................................. 14 2 3 E dg e d election .......................................................................................................... 16 24 Sobel edge detector ............... ................ .............................................. 18 25 Hough transform ...................... .. ........... .....................................21 26 Pathfollow ing approach .................. ............................................................. 27 27 R oad seed selection .............. ................... ................................................ 28 28 Width estimation ..................................... .............................28 29 C ost estim action ............. .. .................. .................. ............ ........... .. ............... 29 210 Road traversal at intersection ........................................................ 31 211 Global roadfeature extraction ......................................................... 32 2 12 S alien t ro ad .............................................................................................................. 3 4 2 13 N o n salien t ro ad ........................................................................................................3 5 214 Salient roadfeature extraction ......................................................... 37 215 N onsalient roadfeature extraction ...................................................... ................ 39 2 16 R o ad lin k in g ............................................................................................................. 4 0 217 N etw ork com pletion hypothesis.................. .................................................... 46 218 Segment insertion .................. .. ............ .....................................48 219 E xtracted R oad Segm ents......................................... ........................ ................ 49 31 Anisotropic diffusion using PeronaMalik algorithm ........................................56 32 Isotropic diffusion using G aussian...................................................... ................ 56 33 N onlinear curve ............. .. .................... .................. .................... ...... ... ........... 59 34 Square lattice exam ple ........................................... .......................... ................ 63 41 Snaxel and snakes ........................ ........... .........................70 42 Scale space representation of Snake.................................................... ............... 71 43 Internal energy effect. ............................................... ............. ................ 74 44 Spring force representation ....................................... ....................... ................ 76 45 D ynam ic snake m ovem ent ........................................ ....................... ................ 86 51 Input im age for H ough transform .................. .................................................. 91 52 Extracted road using H ough transform ............................................... ................ 92 53 Input im age for gradient snake extraction........................................... ................ 93 54 R oad extracted using gradient snakes ................................................. ................ 94 55 Road extracted using Gaussian and dynamic Snakes..........................................96 56 PeronaM alik algorithm and dynamic Snakes .................................... ................ 98 57 Process of roadfeature extraction ...... ........ ...... ..................... 101 58 Selection of road segm ent ................. ......................................................... 102 59 PeronaMalik Algorithm vs Gaussian.......... ........................ 104 510 Interpolated road points...................................... ......................... ............... 105 511 Road segment subset and its transformed road point.................. ...................106 512 Extracted road using PeronaMalik and dynamic snake algorithm......................107 513 Desired and extracted road edges...... .......... ........ ...................... 109 61 Road extracted using Gaussian and PeronaMalik with dynamic Snakes...........1...12 62 Road extracted on test im ages ...... .......... ......... ..................... 115 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Science RURAL ROAD FEATURE EXTRACTION FROM AERIAL IMAGES USING ANISOTROPIC DIFFUSION AND DYNAMIC SNAKES By Vijayaraghavan Sivaraman December 2004 Chair: Bon A. Dewitt Major Department: Civil and Coastal Engineering The advent of information technology led to the implementation of various engineering applications. Geographic Information System (GIS) is one such application that is being used on a large scale in the field of civil engineering. GIS is used in tracking and maintenance of roads. Graphical representations including attribute information of roads are stored in a GIS to track and maintain them. Graphical representation of road features is obtained through a process of digitization. Research groups in the past couple of decades have been working toward developing methods of extraction to automate the process of digitization. Our study reviewed methods of extraction developed by various research groups, and further developed a method of extraction using a combination of imageprocessing techniques (using 4 stages to extract road features from a rural image). In general, a method of extraction is composed of three steps: preprocessing, edge detection, and feature extraction. The method of extraction developed in Stage 1 was implemented using Gaussian, Sobel Edge Detector, and Hough Transform. Results obtained using this method were not as desired, because of the roads being extracted as straight lines, while they existed as curvilinear features. Hence, this method was modified in Stage 2 by implementing Snakes, using the gradientdescent algorithm. This method yielded better results than Stage 1 by extracting curved as well as straight roads. The resultant extracted road had a jagged appearance due to Snake's movement to the steepest gradient within the image. This problem was overcome by using dynamic programming in Stage 3, to restrict the movement of Snake to its neighborhood. Results thus obtained in Stage 3 were smooth and continuous. However, these results deviated from desired road edges at locations with noise .The problem was due to implementation of Gaussian blurring at the pre processing stage, because of its isotropic nature. This problem was overcome by implementing the PeronaMalik algorithm, an anisotropic diffusion technique, instead of Gaussian blurring, leading to better results as compared to Stage 3. Results obtained in Stage 4 were better compared to Stage 3 at locations with noise. Overall, Stage 4 performed better compared to Stage 3 on visual inspection. To support this conclusion, results from Stage 3 and Stage 4 were evaluated over a set of 10 rural road segment images based on their goodness of fit and a hypotheses test implemented using Ftest. Based on goodness of fit and the hypotheses test, results were better for roads extracted from Stage 4 than Stage 3. CHAPTER 1 INTRODUCTION Road networks are essential modes of transportation, and provide a backbone for human civilization. Hence, it is vital to maintain and restore roads to keep our transportation network connected. To do this, we must track their existence in both the temporal and spatial domains. The configuration of road network depends on human needs a road may be constructed or abandoned depending on the needs of the neighboring community that the road serves. Spatial representation of roads (along with their attributes or aspatial information) is managed well in a Geographic Information System (GIS). A GIS is a graphical representation of geographic features, with attribute information related or linked to these features. A GIS is used as an analysis and management tool, allowing the detection of changes over time and space. Spatial representation of geographic features, such as linear structures (e.g., roads) and point features (e.g., power poles or manholes) in a GIS is usually maintained in a vector format, as opposed to a raster. Digitization of desired features in a raster image, leads to their vector representation. Digitization can be either a manual or an automated process. However manual digitization of features is a timeconsuming and laborintensive process. 1.1 RoadFeature Extraction Objectives and Constraints Ongoing research has led to a gamut of methods that automate the digitization process. Digitization methods are either automatic or semiautomatic in nature. In the literature, an automatic method implies a fully automatic process. Theoretically, a fully automatic approach requires no human intervention, but this is not practical. Our study considered a method automatic if no human intervention was needed for road feature extraction at the initial or processing stage. In a semiautomatic method human intervention is required at the initial stage and at times during the processing stage. In both methods, human intervention is needed at the postprocessing stage. Postprocessing intervention is essential in both methods, to extract undetected but desired features from the raster image, and to fix incorrectly extracted features. An automatic method scores over a semiautomatic method due to its ability to automate the operations of the initiation and processing stages. Road feature extraction from a raster image is a non trivial and imagespecific process; hence, it is difficult to have one general method to extract roads from any given raster image. According to McKeown (1996), roads extracted from one raster image need not be extracted in the same way from another raster image, as there can be a drastic change in the value of important parameters based on nature's state, instrument variation, and photographic orientation. The existence of other features, both cultural (e.g., buildings) and natural (e.g., trees) and their shadows can occlude road features, thus complicating the extraction process. This ancillary information provides a context for many of the approaches developed (Section 2.3.2). Thus, it is necessary to evaluate the extent of inclusion of other information needed to identify a road. Some extraction cases need minimal ancillary information; and some need a great deal. These limitations point to a need to develop a method to evaluate multiple criteria in detecting and extracting roads from images. Our study extracted roads solely based on the road characteristics stored in an implicit manner in a raster image. Parameters used for extraction are its shape (geometric property) and graylevel intensity (radiometric property). These purely imagebased characteristics are affected by external sources as discussed earlier. No contextual information was used. The method works solely on image characteristics. The method is semiautomatic, with manual selection of the start and end of road segments in the input image. Future work is needed to automate the initiation process, to automate the road selection process, using Kalman Filter and profile matching processes (Appendix B). 1.2 Feature Extraction from a Geomatics Perspective Feature extraction spans many applications, ranging from the field of medical imaging to transportation and beyond. In Geomatics and Civil Engineering, the need for feature extraction is projectoriented. For example, extracting features from an aerial image is dependent on project needs; the goal may vary from detecting canopies of trees to detecting manholes. The ability to classify and differentiate the desired features in an aerial image is a critical step toward automating the extraction process. Difficulties faced in the implementation of extraction methods are due to the complexity of the varied information stored in an aerial image. A good extraction technique must be capable of accurately determining the locations of necessary features in the image. Detection of a feature object, and its extraction from an image, depends on its geometric, topologic, and radiometric characteristics (Section 2.2). CHAPTER 2 BACKGROUND Roadfeature extraction was studied from aerial images over the past 2 decades. Numerous methods have been developed to extract road features from an aerial image. Road feature extraction from an aerial image depends on characteristics of roads, and their variations due to external factors (manmade and natural objects). A method of extraction is broadly classified into three steps: preprocessing, edgedetection, and feature extraction initializedd by a feature identification step). The efficiency of a given method depends on image resolution and the input road characteristics (Section 2.1), and also on the algorithms used (developed to extract the desired information, using a combination of appropriate imageprocessing techniques). The task is to extract identified road features that are explicit in nature and visually identifiable to a human, from implicit information stored in the form of a matrix of values representing either gray levels or color information in a raster image. Digital raster images are portrayals of scenes, with imperfect renditions of objects (Wolf and Dewitt, 2000). Imperfections in an image result from the imaging system, signal noise, atmospheric scatter, and shadows. Thus, the task of identifying and extracting the desired information or features from a raster image is based on criteria developed to determine a particular feature (based on its characteristics within any raster image), while ignoring the presence of other features and imperfections in the image (Section 2.2). Methods of extraction developed in past research are broadly classified into Semi automatic methods of extraction, or Automatic methods of extraction. Automatic methods of extraction are more complex than Semiautomatic methods of extraction. Automatic methods of extraction require ancillary information (Section 1.1), as compared to Semiautomatic methods that extract roads based on information from the input image. As part of a literature survey, Section 2.3 explains a Semiautomatic method of extraction in detail, developed by Shukla et al. (2002), and an Automatic method of extraction, developed by Baumgartner et al. (1999) from various methods developed in this field of research 2.1 Road Characteristics An aerial image is usually composed of numerous features, both manmade (e.g., buildings, roads) and natural (e.g., forests, vegetation) besides roads. Roads in an aerial image can be represented based on the following characteristics: radiometric, geometric, topologic, functional, and contextual, as is explained in detail later in this section. Factors such as intensity of light, weather, and orientation of the camera can affect the representation of the road features in an image based on the aforementioned characteristics. This in turn affects the road extraction process. Geometric and radiometric properties of a road are usually used as initial input characteristics in determining road edge features. Both cultural and natural features can also be used as contextual information to extract roads, along with external data apart from the information in the image (geometric and radiometric characteristics). Contextual information, and information from external sources, can be used to develop topologic, functional, and contextual characteristics. Automatic method of extraction, implemented by Baumgartner et al. (1999), use these characteristics, explained in detail in Section 2.3.2. Human perceptual ways of recognizing a road come from looking at the geometric, radiometric, topological, functional, and contextual characteristics of an image. For example in Figure 21, a human will first recognize a road based on its geometric characteristics, considering a road to be a long, elongated feature with constant width and uniform radiometric variance along its length. As shown in Figure 21, Road 1 and Road 2 have different overall pixel intensities (a radiometric property) and widths (a geometric property). However, both tend to exist as long continuous features. Thus, it is up to the discretion of the user to select appropriate roads to be extracted at the featureidentification step. If the featureidentification step is automated, the program needs to be trained to select roads based on radiometric variance that varies depending on the functional characteristics of a road; explained later in this section. As an example in Figure 21, Road 1 and Road 2 have different functional properties and have different radiometric representations. In the case if a human is unable to locate a road segment due to occlusion, because of a tree (Figure 21) or a car, a human would use contextual information or topological characteristics. Existence of trees or buildings/houses in the vicinity is used as contextual information. Where as, topologic properties of the roads are used to determine the missing segment of the road network. Thus to automate the process of determining the presence of a road, there is a need to develop a technique for extracting roads, using cues that humans would use, to give the system the ability to determine and extract the roads in an aerial image based on the characteristics of a road described. Road 2 Road 1 Road 2 BAComplete Segment Partial Segment ACompl ete Segment Occlusion Occlusion Figure 21. Road characteristics. This picture illustrates various characteristics of road explained in this section. The road characteristics explained in this section are derived from human behavior in road identification, based on the above explanation of the human interpretation of roads in an image. Further discussion explains in detail each of these road characteristics. Road characteristics are classified into five groups (Vosselman and de Knecht, 1995). Here follows a brief description of each of these characteristics, a couple of which (geometric and radiometric characteristics) are used in the Semiautomatic method explained in Section 2.3.1, and all of that are used in the Automatic method in Section 2.3.2, to identify and extract road features from an aerial image. 2.1.1 Geometric Roads are elongated, and in highresolution aerial images they run through the image as long parallel edges with a constant width and curvature. Constraints on detection based purely on such characteristics comes from the fact that there are other features, like rivers that may be misclassified as roads, if an automated procedure to identify road segments is implemented in an extraction method. This leads to a requirement for the use of additional characteristics when extracting roads. In addition, roads within an image may have different widths, based on their functional classification. In Figure 21, Road 1 and Road 2 have different widths, because of their functional characteristics, they are a local road and a highway respectively, this issue is discussed in Section 2.1.4. Thus, this characteristic alone cannot be used as a parameter in the automatic extraction of a road from an aerial image. 2.1.2 Radiometric A road surface is homogenous and often has a good level of contrast with adjacent areas. Thus, radiometric properties, or overall intensity values, of a road segment remain nearly uniform through the road in an image. A road's radiometric properties, as a parameter in road characterization, identifies a road segment to be part of a road, based on its overall intensity value when compared to a model, or other road segments forming the road network in the image. This works well in most cases, with the exception of areas where buildings or trees occlude the road or the presence of cars affects the feature detection process using this characteristic. It also varies with the weather and orientation of the camera at the time of exposure. For example, in Figure 21, A illustrates the complete occlusion of a road segment and B illustrates the partial occlusion of a road segment due to the trees near the occluded road segment. A method of extraction based on radiometric properties may not identify segments A and B (Figure 21), due to its inability to match the occluded road segment with the other road segments in the image based purely on its radiometric property. Since the radiometric characteristics of the occluded road segments would be very different from those of the unoccluded road segments in the image. In addition, if the process of identification is automated, and if the program is not trained to deal with different pavement types, detection would get affected, since an asphalt road surface may have different road characteristics to a tar road. Hence, a group of characteristics used together would better identify a road segment, as compared to identification based on individual characteristics. 2.1.3 Topologic Topologic characteristics of roads are based on the ability of roads to form road networks with intersections/junctions, and terminations at points of interest (e.g., Apartment, Buildings, Agricultural lands). Roads generally tend to connect regions or centers of activity in an aerial image; they may begin at a building (e.g., house) in Figure 21 and terminate at another center of activity, or continue to end of an image. Roads tend to intersect and to connect to the other roads in an image. Topological information, as explained above, can be used to identify and extract missing segments of roads. As an example, if we have to extract the roads from the image in Figure 21, the radiometric and geometric characteristics of the road would help to extract all the road segments in the image. Though, it won't be able to extract certain segments, due to shadow occlusion A or the presence of cars and buildings B in the vicinity (Figure 21). These missing or occluded road segments could be linked to the extracted segments based on the topological information of the neighboring segments. This characteristic is used in the automatic method of extraction developed by Baumgartner et al. (1999) as explained in detail later in this chapter (Section 2.3.2). 2.1.4 Functional Roads, as discussed in the previous section, connect regions of interest, such as residences, industries, and agricultural lands. Therefore, roads may be classified based on their function as being a local road or a highway. This functional information is relevant in determining the width of the road and the characteristics of the pavement that would in turn be used to set the radiometric properties, allowing the road to be identified based on its functional classification. In Figure 21, Road 1 and Road 2 have different widths (geometric) and overall intensity values (radiometric), since they belong to different functional classes. However, to support the extraction process by using this characteristic there would need to be an external source of information characterizing the road, besides the information stored in the image. 2.1.5 Contextual With this characteristic we may use additional information, such as shadows, occlusions due to buildings, trees along the side of the road and GIS maps, to reference roads using historical and contextual information. This information is generated using a combination of information deduced from the image and from external sources, such as a GIS database. In Figure 21, the occluded road segment could be extracted by combining the information about the extent to which the segment is occluded in the image, with the information stored in the GIS database concerning the corresponding road's history. Of the various characteristics of roads discussed in this section, only geometric and radiometric properties are inherent and exist as implicit information in any image. Whereas functional, topological, and contextual information can be used both as information from the image and from an external data source, to develop an intelligent approach to the identification and extraction of occluded and missing road segments in the image. The Semiautomatic method explained in section 2.3.1 illustrates the use of the geometric and radiometric properties of a road as input information for the extraction of road features technique that was implemented by Shukla et al. (2002). Furthermore, in Section 2.3.2, the Automatic method implemented by Baumgartner et al. (1999) illustrates an extraction process, where the initial extraction process is carried out using the geometric and radiometric characteristics of the road in an image, supported by extraction using topologic, functional, and contextual characteristics. Furthermore, this chapter reviews various imageprocessing techniques that could be implemented to identify and extract road features from an aerial image. In brief, an image processing system is composed of three levels of image processing techniques. These techniques are used in combination to develop methods for road feature extraction from an aerial image, using characteristics of the features in an image to identify and extract road features. Section 2.3 introduces the various levels of an image processing system, with an example to illustrate each level. 2.2 ImageProcessing Techniques According to the classical definition of a three level image processing system (Ballard and Brown, 1982) and (Turton, 1997), image processing is classified into low level, mediumlevel and highlevel processes. Lowlevel processes operate with characteristics of pixels like color, texture, and gradient. Mediumlevel processes are symbolic representations of sets of geometric features, such as points, lines, and regions. In highlevel processes, semantic knowledge and thematic information is used for feature extraction. Sections 2.3.1 through 2.3.3 explain various levels of image processing, with an illustration from each level, explaining a technique and its implementation. 2.2.1 LowLevel Processing This step is concerned with cleaning and minimizing noise (i.e., pixels with an intensity value different from the average intensity value of the relevant region within an image) in the image, before further operations can be carried out to extract the desired information from the image. One of simplest LowLevel processes is to blur an image by averaging the values of the pixels forming the image, thereby minimizing noise; here a mean or an average value is calculated for a group of pixel values forming an image, thereby reducing the variation in intensity between the pixels in the image. Table 21. Image pixel subset 2 3 3 3 2 4 2 3 4 4 5 2 3 4 5 3 6 6 4 4 Image pixel subset represents an image, with red values representing the pixels considered for convolution using Table 22 explained in this section. Table 22. Convolution kernel 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 1/9 Convolution kernel is convolved through the image whose pixel values are represented in Table 21, convolution of Table 22 with pixel subset highlighted in red in Table 21 is explained in this section. For example, given an image, whose subset pixel values are as in Table 21, an average is calculated using a convolution kernel (Table 22). This kernel calculates an average intensity value from the intensity values of the pixels masked by the kernel. The average intensity value calculated by the kernel is then assigned to the pixel coinciding with the central cell of the kernel. The kernel, while moving across the image, calculates and assigns an intensity value for each pixel in a similar fashion. In Table 21 (the numbers in bold), a portion of the image pixel subset is masked by the kernel in Table 2 2, the total of these cells is 27; as the kernel is a 3x3 window, composed of 9 pixel masks and the total value under the mask is 27. The average of the 9 pixels amounts to 3. Thus, the pixel coinciding with the central cell of the kernel is assigned a value of 3. This process assigns the average pixel value to the pixel coinciding with the central cell of the convolution kernel, while moving across the image. Other Lowlevel image processing techniques include convolution, using various forms of weighting functions such as Gaussian, and Laplacian. Blurring using Gaussian as a weighting function, involves generating a Gaussian convolution mask that is then further convolved with the image to be blurred, in a fashion similar to the averaging by kernel convolution discussed earlier in this section and using Table 21 and Table 22. During Gaussian blurring, the generated mask, when convoluted with the input image, gives a weighted average value for each pixel relative to the values in its neighborhood, with more weight assigned to values toward the center pixels. The resultant blurred image is thus different from averaging or mean blurring, where the average is a uniform weighted average. x2 2 G(x,y) = e 2c2 2H2o (21) The Gaussian function is calculated using Equation 21, resulting in a distribution as shown in Figure 22. Here, x and y are the values of the x and y coordinates of the convolution kernel, and o is the standard deviation. A convolution kernel is calculated based on its size, with the mean in the center of the kernel, and the weights assigned to the kernel cells being based on the standard deviation. The convolution kernel in a Gaussian distribution is usually set up with a value of 3 as the standard deviation; this is done as at values of the standard deviation beyond 3 the Gaussian distribution is close to zero. Using this kernel, the convolution is performed along the x and y directions, to blur the whole image. 0, Figure 22. Gaussian kernel. Gaussian weighting distribution kernel is analogous to kernel in Table 22, with higher weights assigned to pixel close to the central pixel. The conventional Gaussian blurring process is isotropic in nature as it blurs the image in a similar fashion in all directions. This process does not respect the boundaries between regions in an image, and so it affects edges, moving them from their original position. Hence, in our study the PeronaMalik algorithm (Malik and Perona, 1990), is implemented, an anisotropic diffusion principle to blur an image, in the developed method of extraction, instead of the conventional blurring process using a Gaussian. In the PeronaMalik algorithm images are blurred within regions, while the edges are kept intact and enhanced, preserving the boundaries between regions. Chapter 3 introduces the principle of isotropic and anisotropic diffusion in Section 3.1, and its implementation in the PeronaMalik algorithm, in Section 3.2. 2.2.2 MediumLevel Processing Mediumlevel processing is a step toward image classification. Some image processing techniques at this level classify the image into regions by themselves. One of the simplest forms of image classification can be performed by thresholding. When thresholding an image, the pixels within an image are classified based on the threshold intensity value. For example, if we have a gray scale image, with the intensity value ranging from 0 to 255, to obtain a binary image, or twoclass image, based on a set threshold value all the pixels values below the set threshold value would be assigned an intensity of 0, and all those above the value would be assigned 1. Other techniques involve detecting the edges within an image that can be further used to visually identify boundaries between regions and support highlevel feature extraction processes. This level of processing is mostly used to determine edges, or boundaries between regions, in an image. What follows is an explanation of the principle of edge detection in an image, and the workings of the Sobel edge detector, a medium level image processing technique. A B C D E F G H Image S / profile hoizo First Second of a ntal line Derivative IDerivative Figure 23. Edge detection. A) Edge image with bright regions in the center and dark on the boundary of the image. B) Edge image with dark regions in the center and bright regions along the boundary of the image. C) Horizontal line profile of edge image in A. D) Horizontal line profile of edge image in B. E) First derivative of the edge image A. F) First derivative of edge image B. G) Second derivative of edge image A. H) Second derivative of edge image B. An edge in an image represents a significant change in intensity between pixels in the image. Edges detected in an image are usually used as information concerning the boundaries between regions in the image, or to allow a shape description of an object in the image. The concept of edge detection is explained further using the illustration in Figure 23. An edge in an image, as in Figure 23, exists as a ramp within an image. In Figure 23, two edges exist in A and B. In A and B, the edges delineate a dark region and bright region, with a bright region existing in the center of A, and a dark region existing in the center of B. I Vf(x, y) = ( (x,fy))2 +((8 f(x,y))2 (8 f(x,y) ZVf(x, y) = Arc tan a x (23) (22) A and B in Figure 23, are considered to be continuous along x and y, f (x, y) then 8 f(xy) a f(x,y) represents the image. Derivatives along the x ( x ) and y directions ( Y ), also known as directional derivatives, are calculated from the input image. Edges within the image are determined based on Equation 22 and Equation 23 that are calculated using the directional derivatives. Equation 22 gives the magnitude of the gradient and Equation 23 gives the orientation of the gradient. Simple edge detectors, developed at the medium level, detect edges based on the gradient information obtained for an input image that is obtained using Equations 22 and 23. In Figure 23, C and D show the profiles of pixel intensity across A and B respectively. In Figure 23, E and F give a graphical representation of the gradient calculated using Equations 22 and 23. The gradient graph in E and F is a representation of the change in intensity of pixels across the image. The edges within an image are detected by determining the local maxima of magnitude of image gradient (Equation 2 2). The peaks in E and F represent the locations of the edges in the images A and B in Figure 23. Detecting edges using magnitude of gradient (first derivative) gives a region rather than a specific edge location. Edges could be better detected using the second derivative, or rate of change of gradient. In Figure 23, G and H give a graphical representation of rate of change of gradient (second derivative). Here, the second derivative becomes zero when the first derivative reaches a maximum. Hence, edges can be easily identified by locating the points at which the second derivatives of image become zero, instead of identifying local maxima within an image using the first derivative. Further, this section discusses the working a Sobel edge detector that performs gradient measurement and locates regions with high gradients that correspond to edges within an image. The Sobel edge detector is a convolution kernel commonly used in image processing to determine regions having high spatial gradients, regions in the image where there is a significant change in gradient from the neighboring pixels. Generally, these regions are along boundaries within an image, or exist as noise within a homogenous region. A Sobel edge detector usually consists of two 3x3 kernels, as shown in Figure 2 4. In Figure 24, a pseudoconvolution kernel, representing the input image, is convolved along the x and y directions to determine the edges in an image using Gx and Gy. Here the convolution masks (Gx and Gy), when moved through an image, compute the gradient along the x and y directions and responds maximally to edges along x and y. A B C Figure 24. Sobel edge detector. A) Convolution kernel along x to compute gradients along x represented Gx. B) Convolution kernel along y to compute gradient along y represented Gy. C) Pseudo convolution kernel on which gradients are determined from the pixel values. In Figure 24, the gradients along the x and y directions are computed by convolving the Sobel convolution kernels with the Pseudoconvolution kernel, to get the gradient in the x and y directions, using Equation 24 and Equation 25. 1 0 +1 2 0 +2 1 0 +1 +1 +2 +1 0 0 0 1 2 1 PO0 P P2 P7 [w] P3 PS K P4 Gx = (P2 + 2P3 + P4) (PO + 2P7 + P6) (24) Gy =(PO + 2P + P2) (P6+2P5+P4) (25) The magnitude of the gradient is calculated by I G  = G G2G (26) The direction of the gradient is the arctan of the gradient along the x and y directions 0 = arc tan (Gx/Gy) (27) The detector then uses the magnitude of the gradient obtained using Equation 26, to respond maximally to regions within an image which have similar spatial gradients to the convolution masks in A and B (Figure 24). Section 2.2.3 introduces Highlevel processing techniques in an image processing system that identify and extract desired objects from an image, based on information obtained through Low and Mediumlevel image processing techniques. 2.2.3 HighLevel Processing In this step, information gathered from the Low and Mediumlevel image processing techniques is used as input information to identify and extract desired objects or regions from an image. The simplest form of Highlevel processing is to label the desired regions with one value, while leaving the rest of the image at zero, by using a threshold value on the original image. More complex image processing techniques at this level involve detecting and extracting shapes within an image. Prominent techniques from this level of image processing include the Hough transform and Snakes deformablee contour models) method. During various stages of the development of a method of road extraction in our study, both these techniques were implemented. The Hough transform is an image processing technique that is used to extract or detect features of a particular shape in the image. Hough transform is used to extract features that can be represented in a parametric form. It detects regular geometric features such as lines, ellipses, circles, and parabolas. The Hough Transform works best with large images where the effects of noise and undesired features are minimal. However, it is difficult to implement in detection of high order curves, those with orders greater than 2. An explanation of how the Hough transform works to extract linear features from an image is presented in the following discussion. Consider an edgedetected image, with a set of point locations/edge pixels that represent a boundary in the image, as shown in Figure 25. In Figure 25, a number of line segments, can connect combinations of points 1, 2 & 3, to represent a linear edge. The following is a parametric representation of a line that is significant to Hough transform implementation. Each of the possible segments connecting set of points can be represented in the form of Equation 25 by varying the values of p and 0 that uniquely identify a single line. x*cosO+ y*sin0 = p (25) 0 is the orientation of the line, with respect to the origin, and p is the length of the normal of the line from the origin, as in A (Figure 25). The objective is to pick the best fit line that passes through the maximum number of edge pixels (i.e., 3 edges) as shown in (Figure 25). In (Figure 25), three edge pixels and each of these points, or edge pixels, can have many lines passing through it; as shown with lines (red and bold black) in A (Figure 25). The objective of the Hough transform is to pick a line that passes through maximum number of edge pixels, the black line in A (Figure 25). A B represented in parametric form defined by cells. As is shown in A (Figure 25), there are numerous lines passing through each of the points. The lines passing through an edge pixel can be uniquely identified by the values of p and 0, represented by Equation 25 in parametric form. Each p and 0 uniquely represent a cell that identifies a line in HoughAccumulator space in B (Figure 25). The splines in B (Figure 25) are edge pixel representations in Hough Space; the three curves represent the three edges existing in A (Figure 25).As the splines in B (Figure 25), representing each edge pixels, pass through the accumulatorcells in Hough space they cause an increment in the count on the accumulator for the number of edge pixels through which a particular line passes; where each line is uniquely identified by a p and 0 value. Thus, the best fit line that passes through the maximum number of edge pixels in an image is equal to the accumulator cell with highest count. The line corresponding to the maximum accumulator count is picked to represent an edge in the original image. In B (Figure 25), the cell in which all three splines intersect represents the cell with the highest count of edge pixels. Hence, it is considered to be the bestfit line, and so the black line in representation of an edge in A (Figure 25). During the initial stage of this research, an implementation of the Hough transform to extract road features was attempted, but was not considered, as road features were extracted from the image as straight lines; whereas roads typically exist as splines or curvilinear features in an image. This led to the implementation of Snakes (Active Contour model) to extract roads, as they represented road features better than Hough lines. Section 2.3 further introduces various methods of road feature extraction developed over the past couple of decades. This section discusses in detail a SemiAutomatic and Automatic approach to road feature extraction. 2.3 Approaches to Road Feature Extraction There are numerous methods that have been developed to extract road features from an aerial image. Table 23 lists a few of the road extraction methods reviewed here, as part of literature survey, prior to work beginning on the development of a method of extraction in our study. Methods of extraction developed by researchers have been developed using a combination of image processing techniques. Techniques implemented in the methods of extraction may be common to one or more of the listed methods. Road extraction methods are broadly classified into Semiautomatic approaches and Automatic approaches, as was discussed in Section 1.1. The methods of extraction listed in Table 2 3, include a group of Semiautomatic approaches, and an Automatic approach that was developed by Baumgartner et al. (1999) According to McKeown (1996), one of the early researchers involved in developing road feature extraction methods, every image considered for the extraction of a desired feature is unique. Hence, it is difficult to have a general method for extracting road features from any image. Table 23. Methods of extraction Method of Extraction Research Group Cooperative methods of road tracking using road McKeown and follower and correlation tracker Delinger(1988) Road feature extraction using camera model and Gruen and Li snakes. (1995) Road feature tracing by profile matching and Kalman Vosselman and de filter Knecht. (1995) MultiScale and Snakes for Automatic Road Extraction Baumgartner et al. (1999) Detection of roads from satellite images using optimal Rianto et al. search and Hough transform (2002) SemiAutomatic road extraction algorithm for high Shukla et al. resolution images, using path following approach (2002) Methods for road feature extraction have been pursued for the past couple of decades. Methods of extraction developed in the early days of this field of research were carried out using a manual initialization of the process; also know as Semiautomatic extraction approaches. A cooperative method of extraction (McKeown and Delinger, 1988), one of the early methods of road feature extraction, was a process that was developed using a combination of image processing techniques; it extracted roads by edge tracking and texture correlation matching from the input image. These processing techniques (edge tracking and correlation matching) supported each other in detecting road features, in case either of them failed during the extraction process. Hence, the method of extraction is called a cooperative method of extraction. Later, in 1995, a Semi automatic approach for road extraction was developed using a digital terrain model, a type of camera model, along with dynamic programming and Snakes (Gruen and Li, 1995). This approach extracted road edges by concatenating a set of points that represented road locations. Another Semiautomatic approach was developed around the same time, and extracted road features using the Kalman filter and profile matching (Vosselman and de Knecht, 1995). During the evolution of the various methods of road feature extraction, a research group lead by Baumgartner et al. (1999) developed an Automatic approach. Most of methods developed until this date had the similar extraction steps, but this method tried and tested a different combination of image processing techniques to work in cooperation with each other in modules. Our study will discuss further a Semiautomatic method of extraction, a Semiautomatic road extraction algorithm for highresolution images using the path following approach (Shukla et al. 2002), and an Automatic method of extraction, the Multiscale and Snakes road feature extraction method developed by Baumgartner et al. (1999). Furthermore, a method of extraction is developed in our study that uses a combination of image processing techniques, evolved over stages that use cues from past research to develop a method of road feature extraction. An initial attempt was made to extract roads using the Hough Transform based on a concept from method of extraction developed by Rianto et al. (2002), although the results obtained were not as desired. Hence, many combinations were tested, the final method of extraction that will be implemented in our study uses the PeronaMalik algorithm (Malik and Perona, 1990), based on the anisotropic diffusion principle and Snakes, was developed at final stage, stage 4 (Section 5.1). As part of our study, an attempt was made to automate the initialization, or road segment identification, stage prior to extraction (Section 5.2.1) using the Kalman Filter and profile matching (Vosselman and de Knecht, 1995). Appendix B of our study gives a detailed explanation of the principle and working of the Kalman Filter, along with its implementation for detecting road segments using profile matching and Kalman filter. Furthermore, Sections 2.3.1 and 2.3.2 explain in detail the methods of extraction that were developed by Shukla et al. (2002) and Baumgartner et al. (1999), each under the Semiautomatic and Automatic approaches to road feature extraction respectively. Prior to discussing and evaluating the approaches that have been developed toward road feature extraction from an aerial image, there is some information to be dealt with concerning the general observation of a road. Roads are generally uniform in width in highresolution images, and appear as lines in lowresolution images, depending on the resolution of the image and functional classification of the road. In the Automatic approach discussed below, road features are extracted at various resolutions using contextual information to complete the extraction of roads from an input aerial image. In both approaches (Automatic and Semiautomatic), there is a need for human intervention at some point during the extraction process. A Semiautomatic approach requires initial human intervention, and at times it requires intervention during the extraction process, whereas an Automatic approach only needs human intervention at the post processing stage. In the Semiautomatic approach, road detection is initialized manually with points representing roads, also called seed points. The roads are tracked using these seed points as an initial estimation of road feature identifiers. In the case of a fully Automatic approach, the roads are completely extracted without any human intervention. Post processing is carried out for misidentified and unidentified roads in both approaches. 2.3.1 Road Extraction Algorithm Using a PathFollowing Approach A Semiautomatic method is usually implemented using one of the techniques below. * Post initialization of the road, the road is mapped using a road tracing algorithm. * Distribution of a sparse set of points along a road segment which are then concatenated to extract the desired road segment. McKeown and Delinger (1988) developed a method by which to track and trace roads in an aerial image, using an edge detector and texture correlation information (Table 23). Whereas Gruen and Li (1995), implemented a road tracing technique using a sparse set of points spaced along the road to be mapped using dynamic programming. This section explains in detail a Semiautomatic method of extraction using path following approach developed by Shukla et al. (2002). In the method developed using path following approach, a road extraction algorithm extracts roads using the width and variance information of a road segment, obtained through the preprocessing and edge detection steps, similar to McKeown and Delinger (1988) and Vosselman and de Knecht (1995). This process, being a Semi automatic approach, is initialized by a selection of a minimum of two road seed points. These seed points are used to determine the center of the road programmatically from the edgedetected image. Then, after the desired points representing the initial road segment are obtained, its orientation and width are calculated. The orientation of the initial seed point is used to determine the three directions along which the next road segment could exist. From the three directions, the direction having minimum cost, (i.e., having the minimum variance based on intensity or radiometric information) is considered as the next road segment. This process is carried out iteratively, until the cost remains within the predefined variance value. Below is a detailed systematic explanation of this approach. Figure 26, gives a flow diagram of the extraction process, developed using the path following approach (Shukla et al. 2002). Figure 26. Pathfollowing approach. Flowchart gives a brief overview of extraction process using path following approach explained in detail in this section. Preprocessing (scale space diffusion and edge detection). The original image is diffused or blurred at this step (Figure 26), into a sequence of images at different scales. The blurring in this step is carried out using NonLinear Anisotropic coherence diffusion (Weickert, 1999), as this minimizes variance within the regions in an image. NonLinear Anisotropic coherence diffusion helps maintain the homogeneity of regions within an image. Variance across sections of the road segment is then further used to estimate the cost, based on which road is traced. The anisotropic diffusion approach is a nonuniform blurring technique, as it blurs regions within an image based on predefined criteria. This is different to Gaussian blurring that blurs in a similar manner across the entire image. The image diffused using the above diffusion technique is then used to compute the radiometric variance across the pixels in the image. Edges are then detected from the diffused image using a canny edge detector. The edgedetected image is used to calculate the width of the road across road segments later in the process of extracting road segments. 2 Seed Point Selection Figure 27. Road seed selection. Black line represents the initial seed point selected by the user. road width a   1F e f road edae direction of seed point Figure 28. Width estimation. Road width and direction of road is estimated from the initial seed point selected as in Figure 27. Selection of initial seed points. As this algorithm is a Semiautomatic approach to road feature extraction, the process of detecting and extracting road segments is initialized by manual selection of road seed points. Road seed points, as in (Figure 27), are two points on or near a road segment in an image that form a line segment representing the road to be extracted selected by the user. (Figure 28) illustrates a road seed with ends a and b. Comparing (Figure 27) and (Figure 28), ab correspond to the end points for the black road seed in (Figure 27). Orientation and width estimation. In (Figure 28), ab the current seed point's orientation gives the direction of the road, on the basis of which the rest of the road segments could be determined. The width of the road at the given seed point is estimated by calculating the distance from the parallel edges, gh and ef, to the road seeds ab as in (Figure 28). At this point the width of the road at the initial seed points is estimated, along with the orientation of the road. The orientation of the road at the initial seed points gives a lead as to three directions in which the road could propagate and form the next road segment. g 9g a d h h' Figure 29. Cost estimation. This figure gives possible orientations of next road segment based on the information obtained from Figure 27 and Figure 29 of initial road segment. Cost estimation in three directions. As shown in (Figure 29), there could be three directions bc, bd, and be, along which the road segment could propagate, based on the current orientation of the seed point ab. The edges gg' and hh' are road edges parallel to the current road seed ab. Thus if ab is the current direction of the road segment, bc, bd or be are the possible choices of direction for the next road segment. As per this algorithm, the minimum of the lengths in the three directions bc, bd and be is considered to be the width of the road at current node b, as in (Figure 2 9).Furthermore, each of the three directions bc, bd and be are assigned weights, with the line having the similar direction to the previous road segment being assigned the minimum weight, bd in (Figure 29). After assigning weights to each direction, a cost factor is computed using Equation 26: (Variance bd *Direction bd) COSbd Length bd Lengthbd (26) Here, V (pixelvalue meanbd ) 2 Variance bd bd(lengthbd) (27) Once the cost is estimated in the three directions using Equation 26 and Equation 27, the path having the minimum cost is considered. The cost value is stored and is used to determine the road direction in the next target window. This process continues until the cost factor remains within the set values. This approach continues, forming consecutive target windows, and thereby determining the minimal cost of the road direction at each node. Once all the road points are obtained, the road is traced through the set of points to extract the minimum cost path. This approach is also called the minimum path following approach, as the path having the minimum cost is selected until the end of the road is reached, and is connected to form the final extracted road. While tracing roads the parameters at intersections vary drastically, as is explained below. a b c f e Figure 210. Road traversal at intersection. There would be instances, such as junctions or road intersections, where the width of the road at a point on the junction will suddenly exceed the width at the previous point that was traversed on the road segment, and will have the same minimum path in all directions. As seen in (Figure 210), at the junction, point c would have a greater width than the other road segment points, and the paths in all directions would have an equal minimum cost. This problem is overcome by backtracking, the width of that point is reduced by considering the width of the predecessor point that was traversed in this method, and the problem with the equivalence of the minimum path values is sorted by following one path and tracing the rest of the path, after the whole extraction process is completed. Issues associated with this method of extraction, as with any Semiautomatic approach, is its inability to extract road segments occluded by shadows and other occlusions that then need to be initiated manually by the user. Section 2.3.2 illustrates the working of an Automatic approach to road feature extraction, implemented by Baumgartner et al. (1999). This method of extraction, as its name suggests, does not need any initialization or feature identification step, these functions are performed by the feature extraction method itself. This method of extraction includes some processes that if implemented as standalone processes, would work as a Semiautomatic method of extraction. 2.3.2 MultiScale and Snakes RoadFeature Extraction The automatic method of extraction developed by Baumgartner et al. (1999), explained in this section, gives an idea of the working of an Automatic method of extraction, using information from various sources to extract road features from an aerial image without any human intervention (Section 2.1). Figure 211. Global roadfeature extraction. This picture illustrates the two models used to extract road features in an image automatically over three modules. Figure 211 illustrates an automatic method of extraction developed by Baumgartner et al. (1999) to extract road features from aerial images using information from coarse resolution and fine resolution images. The method of extraction is divided into two models, a road model (A), and a contextual model (B), as shown in (Figure 2 11). The road model extracts the roads from an aerial image, from fine and coarse resolutions of an input aerial image. At coarse resolution, the roads exist as splines or long linear features, with intersections and junctions as blobs. At fine resolution, roads exist as long homogenous regions with uniform radiometric variance. The road model extracts roads at coarse resolution by assuming that road segments exist as long, bright linear features. At fine resolution, the road model uses real world (Al) information (e.g., road pavement marking, geometry). It also uses material information (A2) determined based on the width of the road segment and the overall radiometric variance of the road segment, depending on the pavement type or material (e.g., asphalt and concrete), and image characteristics of whether the identified road segment is an elongated bright region. In brief, the road model introduced above extracts roads based on the road segment's geometric, radiometric, and topologic characteristics (Section 2.1). The method of extraction developed by Baumgartner et al. (1999) also includes a context model (B) in (Figure 211) that extracts road segments from the input image, using information about other features that exist near the road segment. The context model extracts the road from an input image using a global context and a local context. These contexts support each other in the process of extraction. The global context (B) sets an input image to an urban (B 1), rural (B2), or forest (B3) as in (Figure 211). The local context exists within the input image (e.g., a tree or building near a road segment), that is occluded by the feature or its shadow, or individual road segments existing by themselves. A tree occluding a road segment could occur whether the global context is urban, rural or forest, whereas a building or its shadow occluding a road segment could only occur in an urban or a rural area, where buildings such as residences or industrial infrastructures may exist. Thus, the global and local context within the context model work together to extract road segments. This section explains the method of extraction in detail that uses the road model and context model; it does so with the use of an example of rural (global context) road feature extraction. Another significant point is that roads existing in an urban area may not be able to be extracted in a similar fashion to those in a rural area, since they may have different geometric and radiometric characteristics and contextual information. Thus, the local context within an input image is assigned to a global context, based on which roads are to be extracted. The model used depends on what information is needed to extract a road. Salient roads (Figure 212) that are clearly visible and are not occluded or missing sections may be extracted using geometric and radiometric characteristics, the geometry and material part of the road model. Figure 212. Salient road. Road in gray in this picture is a salient road as it is not occluded or there is no section of road missing and exists as a continuous feature across the image. Figure 213. Nonsalient road. Road in this picture is a nonsalient road, as it is partially occluded by shadows of tree thus affecting the radiometric and geometric property of the road. Nonsalient roads (Figure 213), (road segments within an aerial image that are occluded by the shadow of a tree or building), may need the use of a context model to extract them from the image. Table 24. Module of extraction Module I Module II Module III (Local Extraction) (Global Extraction) (Network Completion) Salient road Lowlevel processing Generation of link hypotheses Nonsalient Road Fusion Verification of hypotheses Road junction linking Graph representation Insertion of accepted road hypotheses  Road network generation  Module of extraction is composed of three modules, through which roads in an image are extracted using road and context model in combination. As per the strategy of extraction developed by Baumgartner et al. (1999) salient roads are initially extracted; followed by the extraction of Nonsalient roads. This process is followed as extracted salient road segments, can help to guide the extraction of non salient road segments, explained in detail later in this section. After the extraction of all roads, a network is generated by connecting salient and nonsalient roads, forming a road network within the input aerial image. The method of extraction developed using the road model and context model can be broadly classified into three modules, as in Table 24. Module I performs road extraction in a local context, using a highresolution image, initialized by extraction of salient road segments, followed by nonsalient road segment extraction, and the extraction of the junctions or intersections that connect the extracted road segments. Module II performs extraction in a global context, as a low level processing step, using a lowresolution image as input. This is followed by the fusion of the extracted road segments from the local level extraction that was implemented in Module I, and the first step (lowlevel processing) implemented in Module II. The final step of Module II involves the generation of a graph representing the road network from the road segments generated from the fusion. Road segments obtained through this fusion represent the edges, and its ends represent the set of vertices of the generated graph. Module III of the developed method improves the extracted road network obtained through Module I and II. It does so by the generation of link hypotheses, and their verification, leading to the insertion of links. This allows complete extraction of the road segments forming a network, without any broken road segment links. What follows in this section explains in brief the implementation of each module. 2.3.2.1 Module I This module uses edge and line information to begin extraction. Hypotheses for the location of the salient roads in the image are determined from the extracted lines and edge information in the image. Extracted salient roads, along with local contextual information, are then used for the extraction of nonsalient roads. Then, in the final step of Module I, the road junctions are constructed geometrically, using the extracted road information at the end of this module. Information about salient, nonsalient roads and road junctions is passed on as input to Module 2. A B Figure 214. Salient roadfeature extraction. A) Represents the extracted road centerline in black and edges in white B) represents the road quadrilaterals formed by information from extracted road edge and centerline in A. (Picture Courtesy Baumgartner et al. (1999) Figure5 Page 6). Salient road extraction. In this step, roads are extracted at a local level, using edge and line information extracted from fine resolution input image, and the image at a coarse resolution. (Figure 214) A represents the road lines, extracted using a coarse resolution image, in black, and road edges extracted from a fine resolution image in white. The distance between the pair of extracted edges must be within a certain range. The minimum and maximum distance depends on the class of road being extracted. For the extracted edge to be considered as a road edge it must fulfill the following criteria: * Extracted pairs of edges should be almost parallel. * The area enclosed by a pair of parallel edges should have homogenous radiometric variance along the road segment. * There should be a road centerline extracted along the center of the extracted road edges. As in A (Figure 214), the black road centerlines lie along the middle of the extracted white road edges. The edges are selected as road edges by the local fusion of extracted lines and road edges. Using the road edge information, road segments are constructed as quadrilaterals (Figure 214) that are generated from the parallel road edges. Quadrilaterals sharing points with neighboring quadrilaterals are connected. The points on their axis, along with the road width, represent the geometry of the road segments. This road information is used as semantic information for the extraction of nonsalient parts of the road network in the next step of Module I. Nonsalient road extraction. Nonsalient road segments cannot be extracted as salient road segment, since they are occluded by the presence of cultural (e.g., buildings) or natural (e.g., trees) objects or their shadows. Thus to extract a nonsalient road, there is a need for additional knowledge compared to the information needed for the extraction of salient roads. This step of Module I, extracts nonsalient road segments by linking the extracted salient roads obtained from the previous step, and assuming that the nonsalient road segments are gaps between salient road segments. In addition to the linking of non salient roads, incorrect hypotheses for salient road segments are eliminated at this step. As most of the road segments extracted by the fusion of local edge and line information in previous step are short, the linking of correct road segments and the elimination of false road segments is achieved by grouping salient road segments into longer segments. This process is performed using the following hypothesis and test paradigm that groups short salient road segments, bridging the gaps as well as extracting the nonsalient road segments. Hypotheses concerning which road segments should be bridged, are generated based on the comparison of the geometric (width, collinearity and distance) and radiometric properties (mean gray value, standard deviation) of the new segment and the segment to be linked. The road segments are verified through three stages, using the following hypotheses: * In the first stage, the radiometric property of the road segment to be linked is compared to that of the segments to be linked. If the difference between the radiometric properties is not too great, then the connection hypothesis is accepted. * If the connection hypothesis is not accepted from the first stage, the "ribbon snake" approach is applied to find an optimum path to connect the salient road segments. If this also fails, final verification is performed using local context information. * The final verification is the weakest form of hypotheses testing, at this stage local contextual information is used to extract the nonsalient roads. A B C D Figure 215. Nonsalient roadfeature extraction. A) Represents an occluded road segment and extracted salient road edge in white B) represents the occluded road segment extracted using optimal path C) represents the road extracted using optimal width information D) represented the road extracted by on width hypothesis (Picture Courtesy of Baumgartner et al. (1999) Figure 8 Page 8). Figure 215A illustrates an occluded or nonsalient road segment, with the corresponding extracted salient road segment in white; this is used to give the initial hypothesis. In (Figure 215), B represents the road extracted using the optimal path process, C represents the road extracted by optimal width verification, and D represents the road extracted by selection of hypothesis on the basis of constant width. As can be understood from the results, the road extracted by the hypothesis based on geometric characteristics of the road gives a better result than any other stage verification. Figure 216. Road linking This figure illustrates the extracted road edges in white, with their links represented in black and junctions in white dots(Picture Courtesy of Baumgartner et al. (1999) Figure 9 Page 8). Post the extraction of salient and nonsalient roads in Module I, the extracted road segments need to be connected. The connection of road segments is performed in final step of Module I through road junction linking. Road junction linking. The hypotheses concerning junctions are based on geometric calculations. In this step of Module I, the extracted road segments are extended at their unconnected ends. If an extension intersects with an already existing segment, then a new road segment is constructed that connects the intersection point with the extended road. The verification of these new road segments is performed in the same manner as in the case of nonsalient road segment extraction in the previous step. In A (Figure 216), the black dotted lines represent the extension of a road segment to form a new road segment, and B in (Figure 216) illustrates the extracted road segments with junctions as white dots. Although this approach leads to the extraction of most of the roads in rural images, it does not tend to work in the same way in urban images and forest images, as the local context for rural images is different to that in urban and forest images. In the case of urban images, the network of roads may be denser, and their appearance may also be different from the road segments existing in a rural image. The road features extracted in this Module, i.e. Module I, were based on local context, i.e. within an image. Module I extracted roads using geometric and radiometric properties of the road segment, and concentrated on local criteria within an image to extract road edges. Module II performs extraction on a global context, considering the whole image. In Module II, the topological properties of the roads are used to extract roads, to support the extraction process implemented in Module I, and improve upon the extracted results. The road network extracted in Module II has more road segments than Module I as Module II is less constrained. 2.3.2.2 Module II An intrinsic topological characteristic of roads is to connect places. Roads are constructed along paths that provide the shortest and most convenient way to reach places. This property leads to searching for an optimal connection between places. The method of determining the best connection between two points is of importance for road extraction. This approach is feasible on lowresolution satellite images, as roads exist as linear bright features, forming a network; they do not do so in highresolution images, as high resolution images are more specific concerning individual road segments and their geometric and radiometric properties. In this module, the approach adopted is modified to integrate and process road like features from various input sources, i.e. lines extracted at different scales. Module II performs extractions over four steps, i.e. Lowlevel processing, Fusion, Graph Representation and Road Network generation. The information obtained from Module I of the extraction process is passed on as input for Module II. During the initial step of lowlevel processing, the roads that exist as long bright features are extracted. These extracted features are further merged with road edges extracted by local extraction in Module I, in the fusion step of Module II. The Graph representation step constructs graphs using the fused road segments from the previous step, with road segments represented by edges and the junctions of road segments as vertices. The final step in Module II is to use the output of the Graph representation step to generate the road network. The discussion below briefly explains each step. Lowlevel processing. In this step, road segments are extracted by extracting a line from a lowresolution image. This approach returns lines as sets of pixel chains, as well as junction points, in subpixel precision. Some of the extracted lines may represent actual roads, and some of the roads that are extracted may not necessarily be roads, they may be other features such as rivers misidentified as roads. In the analysis of roads, the behavior of several lines attributes is important, but the most significant change in lines is high curvature. Hence, from the lines extracted at lowresolution, the lines were split into road segments and nonroad segments, based on points of high curvature, as the probability of road having a very steep curve is low. If some road segments are misidentified, or not identified at all, then they will be identified in the next step of the fusion process. Here each extracted line feature that is classified as road segment is supported by an extended description, based on the following calculated properties: * Length * Straightness, i.e. standard deviation of its direction. * Width ( mean width of the extracted line). * Constant Width ( standard deviation of the width). * Constant radiometric value of a road segment (standard deviation of the intensity value along the segment). Fusion. In this step, the road segments obtained from the previous step are combined with the roads extracted from the local extraction performed in Module I. On fusion, both types of road segments are stored as one set of linear data. Segments in this linear data set are unified if they happen to lie within a buffer distance with a suitable width, and if the two segments have a directional difference less than a set threshold, otherwise they are evaluated as intersections. Overall, after the roads are extracted in this Module, the result is a more complete network, than was extracted in Module I. However, the process may also result in falsely detected road segments. Next, the extracted output is represented in the form of a network graph. Graph representation. Once the segments are fused, a graph is constructed, with the road segments as edges and vertices as points of connectivity. In cases where two or more segments intersect, only one point/vertex is retained, to preserve the topology of the road. Attribute values of road segments, assigned in the lowlevel processing of this Module, are used to weigh the graph, by associating every edge with a single weight. At this step of the extraction it is difficult to determine whether a road junction between two segments truly represents a connection of road segments. Thus, an additional hypothesis is generated to determine the connections between the edges of the graph. The following are the criteria that are used to measure the quality of the hypotheses: Direction difference between adjacent road segments; either orthogonality (within road) or orthogonality (at a Tjunction) is assumed as a reference. * The absolute length of the connection. * The relative length of a connection compared to the length of the adjacent road segment with the lower weight. * An additional constraint that prevents a connection hypothesis from assigning a higher weight than its adjacent road segments. A linear fuzzy function is then defined to obtain fuzzy values for the hypothesis on each of the above criteria; these values are then aggregated into an overall fuzzy value using the fuzzy and operation. For example: a fuzzy function is defined for the difference in direction to determine orthogonality within a road segment or at road segments. To prefer either a continuation of the road segment, or to support the idea of a possible road junction, a fuzzy function with two peaks is considered, one at 00 and one at 900, this supports collinearity and junctions. Thus, a road connection may be classed as either a road segment or a Tjunction, by classing the connections as either a Tjunction or a collinear road segment. This classification can be built using the other parameters used for evaluating junction hypotheses; for example the length of the connection as compared to the length of the road segments to be connected, can be used as a weighting function in the process of determining whether the connection is a junction or a road segment, by using the above defined fuzzy value. Next the roads are generated using road seed generation (points or places of interest to which a road connects) as the final step in Module II. Road network generation. Here, the road seeds are used in extracting roads, by determining the optimal path between the seeds representing the origin and destination. The seeds in this step are points of interest, like buildings, and industrial areas. The algorithm for road network generation finds the shortest path using the Dijkstra Algorithm on the weight assigned to each road segment. Weights are assigned to road segments depending on their fuzzy value. The weight (w) is assigned to a road segment based on the fuzzy value that is assigned, these vary between 0 and 1, by using the true distance between the vertices. If a segment does not form a link between vertices, then a fuzzy value of 0 is assigned, leading to an infinite weight on the road segment, and thereby removing it during calculation of the shortest path for road network generation. Ii ri, if verticesi andjareconncetedndlengtlanddistancd)etween tfan "r">0 dij if i andj arenotconnectedn theoriginagraph r. andr> 0, wheralis theeuclideailistancdhetween wvicesiandj ,' J otherwise oo wij = (28) In Equation 28, the weight is assigned based on r, the fuzzy value as introduced earlier, w.. and "i" and "j", representing the vertices. Here J is calculated based on the true distance between the vertices that is used below in generating a road network by determining the optimal path, using the weights as inputs for the Dijkstra algorithm. Most of the road segments are extracted from an input image, through the extraction processes implemented in Module I and Module II. The extracted road segments exist as fragments of a disconnected road network. Since some road segments were not extracted in either Module I or II, the resultant road network is further connected to complete the network using the functional characteristics of a road, along with verification from the image using a hypothesis generated in Module III. Module III, the final module of the extraction process, is implemented over three steps. The first step generates a link hypothesis, based on that the extracted road segments are connected through the verification and insertion steps in Module III. 2.3.2.3 Module III In this module, the information about the utility of a road network, its topographical characteristics and various factors such as environmental restrictions and the locations of cities and industrial areas, are used in the process of linking extracted road segments. The results obtained up to this step, through Module I and Module II, along with information on missing segments, are again used and the whole road is reconstructed based on a hypothesis generated in this module. This hypothesis is then used in completing the network at this final stage of the three module extraction process developed by Baumgartner et al. (1999). nnd AB  B A O AB / odBC odAC /' d BD C \BD r( Cnd D Figure 217. Network completion hypothesis. This figure is used as an illustration to explain the process of extraction using the functional and topological characteristic of the road explained in detail in this section. Hypotheses for network completion, as they are implemented in this research, work as follows. A sample network is shown in (Figure 217), with four nodes A, B, C, D being considered. Among the set of four points or nodes the shortest path in the network is determined; optimal paths along diagonals are also considered for evaluation. These distances are evaluated for shortest path, as this is the best means for fast and cheap transport among a set of options. The network distance nd in Figure 217 depends on the actual length and the road class along which the shortest path is found, where as the optimal distance od in Figure 217 depends on factors such as topography, land use and environmental conservation, given that we have this information readily available for the generation of hypotheses. Generation of link hypotheses. A preliminary link hypothesis is defined between each possible pair of points or nodes. A socalled "detour factor" is calculated for each preliminary hypothesis as per Equation 29. In Figure 217 the calculation is done for each possible pair of nodes (AD and AC). Networkdis tan ce(nd) Detourfactor = Optimaldis tan ce(od) (29) In this step, potentially relevant link hypotheses are selected. The selection is based on a detour factor, in the sense that links with locally maximum detour factors are of interest, and that there is no preferred direction within the road network. Here the link hypotheses that are generated are verified as per their detour factor. If a link with a higher detour factor is rejected, then the link with the next highest detour factor is considered for verification. Verification is carried out based on image data, whether a detour is considered depends on whether the hypothesis actually matches a road network in the image. Once a link is accepted, it is included in the road network, thus changing the topology of the road network. Link hypotheses, once rejected, are not considered again in the iterative process of hypothesis verification. Verification of hypotheses. The verification of the hypotheses is carried out in relation to the image data. In the verification stage, the roads extracted from the prior Modules are used. Here the link hypothesis is verified against the roads extracted using the road seed generation from Module II. Verification of the link hypotheses is carried out by determining the optimal path between the road seeds using the weighted graph. If the graph provides no connection between two end points, the hypothesis is rejected; otherwise if a path is found, then it is inserted into the road network and replaced with a geometrically improved link. New InsertM Link // I NewJunction Redundant Part ot New Link Figure 218. Segment insertion. Insertion of accepted road hypotheses. At this stage, if a road connects two end points, link hypothesis is detected. The new road is inserted into the whole road network as is shown in (Figure 218). Sections of the new road that overlap with already existing road segments (the redundant part of new link) in (Figure 218) are eliminated. In most insertions, a larger portion of the new road is left that is then inserted into the network by connecting its two ends to the nearest points on the network (the red dot in (Figure 218) could have been connected to the blue dot). If the verified segment is not the end of the segment, a junction is introduced as per the process explained in Module I. In instances where a completely new link segment is eliminated based on hypothesis, no portion of the segment is introduced into the road network. (Figure 218) shows a completely extracted road network from Baumgartner et al. (1999). Figure 219. Extracted Road Segments. (Picture Courtesy of Baumgartner et al. (1999) Figure 10 Page 9). (Figure 219) illustrates the complete road extracted using three module two model process. developed by Baumgartner et al. (1999). In this method, Road model and context model supported each other through the modules of extraction. Many processes implemented within this technique can be used to develop an individual Semiautomatic road extraction method. As was discussed earlier in this chapter, many of the modules from Automatic approaches are implemented in Semiautomatic approaches. The extraction results thus obtained are evaluated further, based on their connectivity and their deviation from the reference data. The method of extraction discussed above is an example of road feature extraction for a rural road. In case of urban road extraction, information from sources as Digital surface models, along with contextual information, are needed to make the approach automatic. This chapter was an overview of the various characteristics that affect the road extraction process, and different approaches to road extraction. Chapters 3 and 4 introduce the PeronaMalik algorithm and Snakes Theory. In our study a Semiautomatic road feature extraction method is developed, using anisotropic diffusion, rather than Gaussian blurring isotropicc diffusion) that is implemented through the PeronaMalik algorithm (explained in Chapter 3). In Chapter 4, the theory and concept of Snakes, and its implementation for feature extraction, is explained; it will be implemented in our study to extract road features from diffused image information using dynamic Snakes. The method of road feature extraction is explained in Chapter 5 that uses the anisotropic diffusion approach developed by Perona and Malik and Snakes to extract roads. Chapter 6 discusses the results obtained, followed by an evaluation and analysis of the results. Chapter 7 concludes the thesis with an overview of the method of extraction implemented in our study, and the future work to be pursued in this research. The automation of the initial step of feature identification and the selection of road segments is one of the essential pieces of work to be carried out in the future. Automation of initial identification, using the Kalman Filter and Profile matching, is explained, as a possibility for the initial road identification step, prior to the feature extraction method implementation (Appendix B). CHAPTER 3 ANISOTROPIC DIFFUSION AND THE PERONAMALIK ALGORITHM An image is in general a photometric representation of real world features. Objects or features from the real world are represented as regions composed of a group of pixels, typically with similar intensity values, in a digital image. Features or objects represented in an image may have similar pixel intensity values, at least within each feature or object existing as a region in an image. Ideally such features may be represented as homogenous regions within an image, for example buildings, trees, or agricultural land within a high resolution image, may be represented as regions with similar pixel intensity values overall. During the capture and development of this information into a digital image, noise or undesired information is also generated, affecting the representation of the real world features in the image. The noise exists as blobs on the image, with pixel intensity values different from the overall or average pixel intensity values that represent a particular region or feature. Many fields use information extracted from an image for varied purposes: medical image analysis etc. During the process of extraction, the existence of noise leads to misrepresentation or false feature extraction. A feature extraction method usually extracts the desired features from an image based on shape and feature boundary descriptions, obtained through the edge detection step of the feature extraction method. The existence of noise within an image affects the feature extraction step, as noise results in false edges being detected that may not exist in the real world and should not exist in the representation of the feature in the image. To overcome this problem, noise across the image is minimized by implementing blurring or smoothing operations; this is done at the initial step of preprocessing in the feature extraction method. In general, smoothing operations assign each pixel within the input image a new intensity value that is calculated from the intensity of the pixel values in its neighborhood in the digital image. This process thereby minimizes variation across pixels, and consequently reduces the noise within an image. The resultant image is a blurred or smoothed variant of the original input image. The image obtained from the preprocessing step is thus significant in extraction of desired features. Below, Section 3.1 explains the principles of isotropic and anisotropic diffusion; this is followed by a discussion on the need and implementation of anisotropic diffusion in the PeronaMalik algorithm in Section 3.2. Section 3.2.1 explains the process of intra region blurring, carried out using the PeronaMalik algorithm, while simultaneously performing Local edge enhancement, as is explained in Section 3.2.2. This chapter concludes with an illustration of the algorithm's implementation on an image lattice structure (Malik and Perona, 1990). 3.1 Principles of Isotropic and Anisotropic Diffusion Conventional smoothing operations implemented at the preprocessing step are usually performed using a Gaussian, Laplacian etc. system. Blurring performed using a Gaussian system blurs the image by assigning each pixel a value based on the weighted average of the local pixel intensity values that are calculated using a Gaussian distribution kernel (Section 2.2.1). Conventional smoothing techniques perform well when used to minimize variation across the image. The process of blurring performed by a conventional technique, like Gaussian, is isotropic in nature; conventional technique will blur the whole image in a similar fashion in all directions. This isotropic property of conventional techniques, while achieving the desired minimization of noise and variation across the image, also blurs the boundaries between regions or features in an image. Therefore it shifts, or leading to the loss of, the location of the actual boundaries between regions, when they are sought in the edgedetection step. In the method of road feature extraction developed in this thesis, the preprocessing step of blurring the image is carried out using the PeronaMalik algorithm; it is an anisotropic diffusion method of blurring, and will be used instead of Gaussian blurring, an isotropic diffusion technique. This anisotropic diffusion approach blurs regions in an image based on location information, (i.e., the blurring within an image is carried out depending on a predefined set of criteria that specify the locations where blurring can be performed). In this algorithm, blurring is carried out within regions in an image, while blurring across regions within an image is restricted by the criteria; the criteria are discussed in this chapter. This method thus preserves the boundary information in the outputblurred image. The blurred image is then used to extract the desired boundaries between regions or shapes, after edge detection. K (x, y) = 2 exp( ') 2tc2 2o2 (31) The idea behind the use of the diffusion equation in image processing arose from the use of the Gaussian filter in multiscale image analysis (Weeratunga and Kamath, K 2001). Equation 31, illustrates a Gaussian filter ", where o is the standard deviation and x and y represent the coordinates of the generated Gaussian mask. The Gaussian mask or kernel, generated using Equation 31 has cell values corresponding to weights that are used in calculating new pixel intensity values by convolving with the input image ( Section 2.2.1). Through this convolution, the image is blurred, with a weighted average value for each pixel arising from the distribution. I(x, y, t) = VI(x, y, t) = (x, t) + (x, t) at 9x 9y (32) Equation 31 can also be written in the form of the diffusion equation, illustrated in Equation 32. In Equation 32, I(x,y, t) is a two dimensional image of I(x, y) at time t = 2 0.5 that denotes the variance Here time t represents the variance; an increment in the value of t corresponds to, or results in, images at coarser resolutions than the original resolution of the image. As an initial condition, the variance is zero that represents the original image I(x, y) I, = a (x, y, t) = V.(c(x, y, t)VI(x, y, t)) at (33) Equation 33 represents a more general form of Equation 32. Equation 33 is used to calculate an output image at any variance "t". In Equation 33, c(x, y, t) is the diffusion conductance, or diffusivity, of the equation. V and V. in Equation 33 are the gradient and divergence operators respectively. The general form illustrated in Equation 33 reduces to a linear or isotropic diffusion equation, as shown in Equation 32, if the diffusivity (c(x' y, t)) is kept constant, and is independent of (x, y), the location within the input image. This leads to smoothness or blurring in a similar fashion in all directions within the image. Gaussian blurring implemented using Equations 31 and 32 is an example of isotropic diffusion, where it is dependent only on standard deviation, o, and not on the location within the image where the blurring is being carried out. The ability of a diffusion method to blur regions within an image, based on the location criteria, is known as anisotropic diffusion, blurring process becomes image dependent and is not the same in all directions or at all locations within an image. Anisotropic diffusion implementation in images is derived from principle of heat diffusion. The distribution of the intensity gradient, or change in intensity values, in an image is analogous to the temperature distribution in a region as a function of space and time. In heat diffusion, the temperature distribution in a region is a function of space and time, in images intensity gradient information in the image also a function of space and time. The need to restrict diffusion across boundaries between regions in an image, and to permit diffusion within regions and along boundaries, leads to the development of a criterion to be implemented in Equation 33, based on the diffusion conductance, or diffusivity, c(x ,y ,t). t = I(x, y, t) = div(c(x, y, t)VI) I JI(x, y, t) = div(c(x, y, t)J) c(x, y t)A V7A ( at =c(x, y, t)AI+Vc.AI (34) Equation 34 is an anisotropic diffusion equation that evolved from the general diffusion equation, shown in Equation 32. In Equation 34, c(x, y, t) is a symmetric positivedefinite tensor that allows diffusion parallel to the gradient and limits any diffusion perpendicular to the gradient, thereby restricting blurring across edges. Here div is the divergence operator and, V and A are the gradient and Laplacian operators respectively. Malik and Perona, (1990) developed an algorithm of converting the linear diffusion into a nonlinear diffusion, or anisotropic diffusion; taking place depending on location. It occurs within regions and along boundaries, while it is restricted across edges in an image. The anisotropic diffusion thus implemented in the PeronaMalik algorithm is carried out locally at the pixel level, and in its neighborhood, based on the "c" value. In addition to the diffusivity "c", conductance "K" is also used to perform blurring within regions, while enhancing the local edges, this process is explained later in this chapter. Section 3.2 explains anisotropic diffusion implementation using "c" and "K" values in the PeronaMalik algorithm. Figure 31. Anisotropic diffusion using PeronaMalik algorithm. Red block highlights well defined edge boundary of the intersection. Figure 32. Isotropic diffusion using Gaussian. Green block highlights the blurred and not so well defined edge boundary of the same intersection as in Figure 32. As can be seen in Figure 31, the PeronaMalik algorithm, an anisotropic diffusion process preserves, and gives a better representation of, the boundaries of road intersections as compared to the boundary information in Figure 32, obtained through Gaussian blurring an Isotropic diffusion process. The boundaries of the road intersection are blurred more in Figure 32, shown in green block than in Figure 31, shown in red block. The road edges extracted from the PeronaMalik algorithm gives a more complete and accurate set of road edge information, than would result from the information obtained from a Gaussian blurring process. The PeronaMalik algorithm was implemented in the preprocessing stage of road feature extraction method developed in our study because of the following reasons: * Its ability to implement intra region smoothing, without inter region smoothing. * Region boundaries are sharp and coincide with meaningful boundaries at that particular resolution (Malik and Perona, 1990). Section 3.2 further explains the implementation of the PeronaMalik algorithm through intra region blurring and local edge enhancement that is performed using the diffusivity "c" value and conductance "K" value. 3.2 PeronaMalik Algorithm for Road Extraction From a road feature extraction perspective, this algorithm would help retain the much needed edge information that is essential in delineating road edges for extraction; whilst it also preserves the radiometric characteristics of the road across the image, by preventing blurring across regions. Hence, using a road's uniform radiometric characteristics, along with semantically meaningful geometric properties representing the road edges, the initial step of road identification and road seed generation could be automated; although this process is performed manually in the method developed in this thesis ( Section 5.2.1). The identified road segments are further used as inputs for the feature extraction method implemented using Snakes ( Chapter 4) in this thesis. Roads are represented as long network structures, with constant width at fine resolutions, and as bright lines in lowresolution images. The diffusion process is implemented in highresolution images, rather than low resolution, as on blurring a low resolution image the roads existing as bright lines would disappear. The process of obtaining a coarse scale (blurred) image, from the original image, involves convolving the original image with a blurring kernel. In the case of an image, I(x, y), at a coarse scale "t", where t represents the variance, the output image is obtained by convolving the input image with a Gaussian kernel K as was illustrated in Equation 31. I(x,y) = 10(x,y)*K (x,y,t) (35) Equation 35 represents an anisotropic diffusion equation, convolving an original image I0 with a Gaussian kernel that performs blurring dependent on the variance t, and location (x, y). Increasing the time value (variance) leads to the production of coarser resolution images. The success of blurring within regions and along region boundaries, as per the principle of the PeronaMalik algorithm, depends on determining the boundaries between regions in an image; this is done based on the value of c(x, y, t). Blurring is carried out within regions in an image, depending on the value of the coefficient of conductance or the value of diffusivity, "c". This can be achieved by assigning the value of 1 to the diffusivity "c" within regions, and 0 at the boundaries (Perona and Malik, 1990). However, we cannot assign the conduction coefficient value to each pixel or location within an image, a priori, as the boundaries between regions are not known. Instead, the location of boundaries between regions is estimated as is explained in further detail in 3.2.1., in order to assign diffusivity values and to perform intra region blurring. 3.2.1 Intra Region Blurring Assuming the existence of an original image I(x, y), and blurring represented by t. the scale to which the image is to be blurred. At a particular scale, t, if the location of boundaries between regions is known, the conduction coefficient c(x, y, t), defined in Equation 34, could be set to 1 within the region and 0 at the boundaries, as was discussed earlier. This would result in blurring within regions, whilst the boundaries are kept sharp and well defined. The problem is that the boundaries at each scale are not known in advance. The location of boundaries is instead estimated at the scale of the input image (Malik and Perona, 1990). The estimation of the location of boundaries is carried out as follows. Let E(x, y, t) be a potential edge pixel at a particular scale "t", it is a vector valued function with the following properties: the value of the potential edge is set to 0 If the pixel or location lies within a region, otherwise it is assigned a value that is the product of the Conductance (K), or local contrast, and a unit vector normal to the edge at the given location (e): E(x, y, t) = 0 if the pixel is within the region E(x, y, t) = K e(x, y, t) at the edge point g(S) S Figure 33. Nonlinear curve. This curve represents the magnitude of gradient used for estimating boundary locations within an image. In (Figure 33), e is a unit vector normal to the edge at a given point, and K is the local contrast (i.e., the difference in image intensities on the left and right of the edge), equivalent to the flux in a heat diffusion equation. Once an estimate of the edge is available, E(x, y, t), the conduction coefficient c(x, y, t) is set to be a function of g( E ), the magnitude of E. The value of g(.) is nonnegative, and is a monotonically decreasing function, with g(0) = 1, as illustrated in Figure 33. Once diffusivity is estimated for all the locations within an image. Diffusion is carried out in the interior of regions, where E = 0, while restricting diffusion along boundaries between regions, where the magnitude of E is large. Thus, preserving the boundaries of the roads at each scale of the image. Further this section, explains how the diffusion coefficient, chosen as a local function of the magnitude and gradient of the brightness function (Malik and Perona, 1990) within the image, preserves and sharpens the boundary by the appropriate selection of the g(.) function. c(x,y,t) =g(g VI(x,y,t) ) (36) In general, scale space blurring of images is used to obtain coarse resolution images; this helps in filtering out the noise, but also losses a lot of edge information in the process, leading to the problem of blurring the edges in the image. In anisotropic diffusion as implemented by Malik and Perona (1990), the conduction coefficient, also known as diffusion conductance, Equation 36 is chosen to be an appropriate function of the magnitude of the local image gradient. This is chosen as it enhances edges, while running forward in time/scale, keeping the stability of the diffusion principle (Malik and Perona, 1990). Section 3.2.2 explains the concept of edge enhancement process acting locally, while the process steps ahead in the diffusion process to derive coarse scale images. 3.2.2 Local Edge Enhancement This Section explains how the edges in an image are enhanced, during the process of blurring within regions, from the prior scale or time step. Malik and Perona (1990) modeled an edge as a step function convolved with a Gaussian mask, as is expressed in Equation 37 div(c(c, y, t)VI) = (c(x, y, t)I ) x x (37) It is assumed that an edge is aligned to the yaxis, in order to explain the concept (Malik and Perona, 1990). Here "c", the diffusivity or conductance coefficient, is chosen to be a function of the gradient of "I", as illustrated in Equation 38: c(x, t) g(I (x, y, t)) (38) Let K(Ix) = g (Ix). Ix denotes the flux c. Ix of the intensity between pixels along x. Thus, the 1D version of the diffusion equation becomes It P(I V '(I)Ixx t 9x x x (39) The interest here lies in determining the variation in time, t (variance), of the slope of the edge, a/at (Ix). If c(.) > 0 the function I(.) is smooth, and the order of the differentiation may be inverted: a(I ) (It) 2 ( I +.I JXX t x x x x x xx (310) Instead of differentiating the image by the change in scale of the time step "t", the image at a particular scale, t, is differentiated in space. As is explained in (Malik and Perona, 1990), if the edge is oriented in such a manner that Ix > 0, then at the point of inflection Ixx = 0 and Ixxx << 0, as the point of inflection corresponds to the point of maximum slope (Ivins and Porill, 2000). This has the result that in the neighborhood of the point of inflection, 8/8t (Ix) has a sign opposite to 4)'(Ix). If 4)' (Ix) > 0, then this implies that the slope of the edge will decrease with time, and if 4' (Ix) < 0 this implies an increase in the slope of the edge with time. There should not be an increase in the slope of the edge with time, as this would contradict the maximum principle that states that no new information should be formed in coarse images, derived from the original image (Malik and Perona, 1990). Thus a threshold is set, based on the value of K and a, below which 4)(.) is monotonically increasing, and above that it is monotonically decreasing, giving the desirable result of blurring small discontinuities, whilst enhancing and sharpening edges (Malik and Perona, 1990). A later section of this chapter explains the whole process of anisotropic diffusion, being carried out on a square lattice as an example. 3.3 Anisotropic Diffusion Implementation This section explains anisotropic diffusion on a square lattice, with brightness values associated with the vertices. The equation for anisotropic diffusion is discretized for a square lattice. In Figure 3.4, the brightness values are associated with the vertices and the conduction coefficients are shown along the arcs. Equations 311 and 312 are, respectively, general and discrete representations of anisotropic diffusion for the square lattice shown in Figure 34 that represents an image subset as a square lattice. It = div(c(x, y, t)VI) = c(x, y, t)AI + Vc.VI It = I + A[CN .V I VI I .VE E w w j (312) ( INi,j ij+l CNIJ IW ij i1lj Eij i+lj csij () 1sij ij1 Figure 34. Square lattice example. This example explains the working of a Perona Malik algorithm with the vertices representing the image pixels and the lines representing the conductance. In discrete anisotropic diffusion Equation 312, a four neighbor discretization of the Laplacian operator is used, where 0 <= X <= 14 and N, S, E and W are subscripts used for the vertex locations along each direction; the symbol V represents the difference in the nearest neighbor lattice structure, and not the gradient: V N' I ,j 'ij SS + j ii,j V El =I i, j +1 i,ij VW 1I = J 1 i, j (313) The conduction coefficients, or diffusion conductance, are updated with every iteration, as a function of the brightness gradient (Equation 310), as is shown in the list of conductances in (3.14): ct =g( (VI)t Ni, i + (1/2 ctS =g(l (VI)t c = g( (VI) t i, Ij i + ( )2j ct g( (VI)t / (Y9JW (314) Perona and Malik, in their paper on "Scale space edge detection using anisotropic diffusion", have proved that image information at the next scale will lie between a maximum and minimum value in the neighborhood of the pixel under consideration from the previous time step or scale. Hence, with k e [0,1/4] and c e [0,1] the maximum and minimum of the neighbors of li,j at iteration t is (IM)ti,j = max{(I,IN, IS, IE, IW)ti,j } and It+1 (Im)ti,j = min{(I,IN, IS, IE, IW)ti,j }. Thus, the new value at t+1 is Q, that lies between the maximum and minimum values in its neighborhood, as illustrated in Equation 315. ( it+lI m),j < I ,j < (IM) ,j (315) Hence, it is not possible for there to be local maxima or minima values within the interior of the discretized scale space. It =,j + A[CNVNI + CS .VS + CEVEI + c.VwI] ij i *W i Tt tt Ilji A c +CE +CW)J(+(CNIN S +CSIS +CE.IE +CWW I i, J Mi, j ( N S +E +W iJ + Mi, j N S E W i, j It = i, j (316) Similarly, It1> t (1A +c +c+c +AI t t Smi, j A(CN S E W)i, + i, j N S E W, (317) The scale space diffused edges can be obtained using either of the following functions for g(.), as used by Perona and Malik in their work to blur images using anisotropic diffusion. g(VI)= e ((V/K)2) (318) 1 (318) g(VI) = 1 ) [V[)2 K (319) The scale space generated by these two functions is different, depending on the edges that they are used to detect. The first function (Equation 318) priorities high contrast edges over low contrast edges, whereas the second function of g(.) Equation 319 is used for wider ranges over smaller regions. This chapter has presented an explanation of the PeronaMalik algorithm, and how it detects edges through the scalespace of an image, using anisotropic diffusion. The main reason behind implementing this approach in road extraction is to get appropriate edge information at each scale, and to obtain a uniform radiometric variance within the desired features. In this thesis, road edges are detected using information from the diffused image, and then extracted using Snakes 66 deformablee contour models). Snakes, as implemented in this thesis, uses the information about an edge, gained from the diffused image around the position of each snaxel, in the process of relocating the snaxels closer to the road edges. A detailed discussion of this process, with an explanation of the concept of dynamic programming and Snakes, is provided in Chapter 4. Chapter 4 introduces the working of a Snake, and its implementation using dynamic programming. CHAPTER 4 SNAKES: THEORY AND IMPLEMENTATION There are numerous methods to extract road features from an edge detected in an aerial image. In this research, road feature extraction is performed using Snakes (Kass et al. 1988) on an image that was preprocessed using the PeronaMalik algorithm (Malik and Perona, 1990), explained in Chapter 3. Snake is a vector spline representation of a desired boundary that describes the shape of an object or feature in an image, existing as a group of edges detected from a preprocessed image. This vector is obtained by concatenating the snaxels, or points, initially located close to the desired edge of the feature in the image, and then recursively relocating and concatenating them to align to the desired shape in the image. In our study, in working toward the objective of extracting road segment edges from an aerial image, an initial set of road point locations, or snaxels, are generated and used as inputs to be recursively relocated to get the desired shape by aligning them to edge over a series of iterations. The reason behind implementing Snakes on the PeronaMalik algorithm processed image, is the unique nature of the PeronaMalik algorithm that blurs the image within regions, while preserving boundaries and edges in the image, as was discussed in Chapter 3. This process retains and further defines the boundaries of the road edges in an image that is significant in the process of extracting road edges, as it is the edge information that is needed for Snake's implementation. According to Kass et al. (1988), Snake is defined as an energy minimizing spline, guided by external forces and influenced by image forces that pull the spline toward desired objects that are defined and predetermined by the user, as is discussed in further detail in this chapter. Snakes. is also called the Active Contour Model; 'active' because of its habit of exhibiting dynamic behavior by recursively relocating the snaxels to align the Snake to the desired feature in the image. When implementing Snake on an image, the first step in the process of extracting the desired object is carried out by an initialization, where a set of points are placed close to the desired feature. This set of points, or snaxels, can be generated automatically or semiautomatically. In a Semiautomatic approach the user needs to select the points in or around the vicinity of the desired object, in case of roads, we need to place points randomly, but close to the road edge features in the image. In the case of Automatic approaches, the desired features are identified automatically, this process is followed by the generation of road seeds/points or snaxels. Snakes relocate the snaxels from their initial positions recursively; they do this by moving each snaxel individually, to minimize its energy and the overall energy of the snake so as to get the best possible alignment of the snake to the shape of the desired feature in the image. This set of points known as snaxels, are iteratively moved closer to the original location of the edge, using either the dynamic programming or gradient descent technique to minimize the overall energy of the snake, as will be explained in detail in Section 4.2. In what follows is a discussion of the theory and concept behind Snakes and their implementation. The basic mathematical explanation of Snakes is based on Euler's theory, as it is implemented by Kass et al. (1988) and is explained in Section 4.1; their implementation, and how they are going to be used in the process of road feature extraction is explained in Section 4.2. 4.1 Theory Snakes are splines, or deformable contours that take different shapes based on a given set of constraints. There are various forces acting on Snakes, to deform them, so as to align them closely to the desired object; in general these forces can be classified as internal force, image force and external force, as is discussed in detail later in this section. The internal forces (Section 4.1.1), energy developed, due to bending, serves to impose a smoothing constraint that produces tension and stiffness in the Snakes, restricting their behavior so as to fit to the desired object using minimal energy. The image forces (Section 4.1.3) push the Snake toward the desired edges or lines. External constraints (Section 4.1.2) are responsible for putting the snakes near to the desired local minimum. External constraints can be either manually specified by the user, or can be automated. Geometric curves can be as simple as a circle or sine curve, and can be represented mathematically as y = f(x), where f(x, y) = x2 + y2 = 1, and f(x) = sin(x), respectively. Mathematical representations of splines or higher order curves are much more complex in nature than sine and circular curves. To initialize a Snake, a spline is produced by picking a desired set of points in the image that are in the vicinity of the edge of the desired object. Snakes are also called deformable contours, and they are supposed to pass through points that have similar characteristics. Snaxels, or road points that form a Snake are located on pixels that have similar intensity values to the desired object and are spread along the road feature. The Snake is started as a contour, traced through this set of points that represents the edges of the desired feature in the image. Figure 41. Snaxel and snakes. Snake (Active contour model) in yellow, with snaxels in red are relocated iteratively through an energy minimization process to align the snake to the road edge. The initialization process can be manual or automated; automation of the initialization can be done using highlevel image processing techniques. (Figure 41), is a sketch, giving a visual illustration of Snake initialization points, or snaxels (red points), and the Snake as a contour (yellow). Here the red points represent the initial Snake points, and the yellow spline is the deformable contour or Snake, whose shape changes depending on the relocation of the snaxels, also called Snake or road points in our study. Snakes cannot just detect road edge features, and align themselves to the desired feature's boundary or shape, as they first need some high level information, (i.e., someone to place them near the desired object). In this research, snaxels, or edge points, are relocated iteratively to deform the Snake, or contour, to align it to the desired feature, by minimizing the total energy of the Snake. X(S) A s Si= 0.2 S S 0 Y(S B I S I' Si =0.2 A C Figure 42. Scale space representation of Snake. A) Represents the orientation of snaxels forming a snake. B).Represents the position of snaxel along x based on s C) Represents the position of snaxel along y based on s. The elements of the Snake/contour, its snaxels (i.e., the points forming the Snake), are influenced by space and time parameters. They can be implicitly represented on the basis of space and time as follows. Consider each snaxel position, red points in Figure 4 1, to have x(s,t) and y(s,t) as its coordinates that depend on s (space) and t (time/iteration) parameters; this is explained in Figure 42. In Figure 42 the space, "s", represents the spatial location of an edge in the image, and "t" represents the time step or iteration of the energy minimization process. The contour constructed through these snaxels (Snake elements) is affected by the energy developed using internal and external constraints and image forces; Sections 4.1.1 through 4.1.3 explain these constraints. These forces move the snaxels over time and space to new coordinates, while minimizing the energy over each individual snaxel and whole snake. The objective is to minimize the overall energy to align the Snake to lay over the desired edge. The energy minimization process allows the Snake to detect the desired edge. Here, the energy, ESnake, possessed by the contour is the sum of three energy terms, (i.e., internal, external and image). The total energy, also known as the potential energy, is the force that makes the Snake move toward the desired edge objects. The potential energy is used to detect lines, edges, and terminations in the image. The potential energy developed by the processing of the image is used in Snakes. The total energy of a Snake is the sum of the energies of the snaxels, that form the snake, or deformable contour. The position of a snaxel can be parametrically represented as is shown in Equation 41. V(s) =(x(s),y(s)) (41) Thus, the contour in A (Figure 42) can be represented as: V(s) = [x(s), y(s)]T, s [0,1] (42) Here the Snake represented by Equation 42 is composed of a number of snaxels, who's locations, (i.e., "x" and "y") coordinates, are restricted by the value of"s" being set to fall between 0 and 1. The objective is to align the Snake to the desired object; this can be obtained by minimizing the total energy of the Snake, (i.e., the sum of the energies of the individual snaxels) forming the Snake or contour: 1 Esnake = Eelement (V(s))ds 0 (43) Equation 43 expresses the total energy of the Snake as the integral of the energy of the individual Snake elements, or snaxels, forming the Snake in Figure 41. Thus, the energy of a Snake, or contour, as an integral of the various snaxels forming the Snake, with forces affecting the energies of each individual snaxel "s" is expressed as below in Equation 44. 1 Esnake J Eelement (V(s))ds 0 1 1 1 = Eint (V(s))ds + Eextern(V(s, t))ds + f Eimage (V(s, t))ds 0 0 0 (44) Here, 1 SEnt (V(s))ds 0 is the internal constraint that provides the tension and stiffness, requiring the snake to be smooth and continuous. 1 fEextern (V(s, t))ds 0 is the external constraint, taken from an external operation that imposes an attraction or repulsion on the Snake, such external factors can be human operators or automatic initialization procedures. 1 SEimage (V(s, t))ds 0 this energy, also known as the potential energy, is used to drive the contour toward the desired features of interest; in this case the edges of the road in the image. A Figure 43. Internal energy effect. A) Represents the shape of contour due to high internal energy. B).Represents the shape of contour due to low internal energy. 4.1.1 Internal Energy The internal energy of a Snake element is composed of two terms, a first order term controlled by a(s), and second order term controlled by P(s). The first term makes the Snake act like a membrane or elastic band, by imposing tension on the snake, while the second order term makes the Snake act like a stiff metal plate to resist bending. Relative values of a(s) and P(s) control the membrane and thin plate terms (Kass et al. 1988).Thus, the internal energy of the spline can be expressed as in Equation 45 E. (V(s))ds (a(s) Vs (s)12 +(s) Vss (s) 12) mt 2 (45) In Figure 43, the objective is to trace the edge of the circle using Snakes. If the internal energy is kept high the Snake remains stiff; A in Figure 43 represents the shape of the contour when the energy is high, and the shape of the contour when the energy is low is as in B (Figure 43). Thus, increasing a, increases the stiffness of the contour, as it serves as a tension component, while keeping it low keeps the contour more flexible. 4.1.2 External Energy This energy is derived from processes initialized either manually or automatically. Either a manual or an automatic process can be used to control the attractive and repulsive forces that are used to move the contour model toward the desired features. Here the energy generated is a springlike force (Kass et al. 1988). One point is considered to be fixed, the prior position of a snaxel, and another point is taken to be free in the image, the estimated current position of a snaxel, where it may be relocated at a given iteration. This energy is developed between the snaxels (the pixel points where the points are located) and another point in the image that is considered fixed. Here, is the mathematical representation of this energy: Consider u to be a snake point and v to be a fixed point in the image (Ivins and Porill, 2000), the external energy is given by: Eextern = k I v u 1 (46) This energy is minimal, when u = v, when the image point and the Snake point are the same, and takes a multiple value of k, when 1 < vu < 1. Along the same lines we can have a part of an image repel the contour. k Eextern k Iv u (47) This energy is maximised as infinite when v = u, and is unity when k < vu < k. In Figure 44, fixed end represents a point in the image, and the free end is a Snake point. Spring like forces (springs) developed between the Snake point and the fixed point in the image, and this adds an external constraint to the Snake that is implemented as an external energy component in the development of the Snake. Figure 44. Spring force representation. This force aligns the snake to desired edge based on user information, explained ahead in this section. 4.1.3 Image (Potential Energy) To make the Snake move toward the desired feature, we need some energy functional, functions that attract the Snakes toward edges, lines, and terminations (Kass et al. 1988). Kass et al. (1988) developed three functions, they are shown below, along with their weights. By adjusting the weights of these three terms, the Snake's behavior can be drastically altered. As such, the nearest local minimum potential energy is found using dynamic programming as is explained in Section 4.2; dynamic programming is therefore applied in our study for implementing Snakes that will extract road edge features from an aerial image. P Eimage = ine line + Wedge Eedge + WtermEterm (48) x > x + x (49) Here, the image forces, &x, produced by each of the terms in Equation 48 are derived below in Sections 4.1.3.1 to 4.1.3.3. 4.1.3.1 Imagefunctional (Ei.e) This is the simplest image functional of the three terms in Equation 48 1 Line = I(x(s))ds 0 (410) If we have the image intensity for a pixel as E line, then depending on the sign of wline in Equation 48, the Snake will be attracted either to dark or light lines. Thus, with conformity to other constraints, the Snake will align with the nearest darkest or lightest contour of image intensity in the vicinity (Kass et al. 1988). The image force 6x is directly proportional to the gradient in the image, as it is expressed in Equation 410: Ox 0c = VI(x) ax ax (411) Thus, local minima near a snaxel can be found by taking small steps in x x > x W(x) (412) where r is the positive time step used to find the local minima. 4.1.3.2 Edgefunctional (Eeage) Edges in an image can be found using a simple energy functional. Eedge = VI(x,) (413) Here the Snake is attracted to contours with large image gradients. Edges can be found using gradientbased potential energies as: JI 2 E  1 02ds edge J 2a 0 (414) As an example, ifx = (x, y) and has a potential energy P(x) = VI(x)12; then the image force acting on the element is given by: Soc a (I VI 12) = 2VVI(x)VI(x) ax ax (415) Hence the strong edges could be found using Equation 416 x > x + rVVI(x)VI(x) (416) 4.1.3.3 Termfunctional (Eterm) Term functions are used to find the end points or terminal points of Snakes. To do this curvature of level lines is used in a slightly smoothed image. C(x, y) = Ga (x, y)* I(x, y) Here C(x,y) is a Gaussian convolved image with a standard deviation of a. If the gradient direction/angle is given by. 0 = tan l( ) x (417) where n = (cosO,sinO) and ni = (sinO,cosO) are the unit vectors along that are tangents to the curve at (x, y) and perpendicular to it, respectively. Using this information the curvature of level contours in C(x,y) is determined using Equation 418. 2 2 (C2 C2)2 Eterm = &0 / & nl = &2c /& n21/&c /& n = x (418) Equation 418 helps to attract the Snake to corners and terminations. Snakes Implementation Section 4.1 discussed the theory and the working principles of Snakes, based on various energy functions. The objective is to get the desired Snakedeformable contour to align with the boundary edge that is needed to minimize the overall energy of the Snake, the sum of the energies of the individual snaxels forming the Snake. So the aim is to optimize the deformable contour model, by minimizing this energy function to find the contour that minimizes the total energy. Here, from the discussion in Section 4.1, the energy E of the active contour model v(s) is: 11 E(X(s))= P(V(s))ds + I + P 2 d 0 0 0 (419) In Equation 419, the first term is the potential energy and the second and third terms control the tension and stiffness of the Snake. The objective here is to minimize the above energy. Minimization of this energy can be performed using the gradient descent algorithm, or dynamic programming. In this research, both methods were tried, and dynamic programming was chosen because of its ability to trace the edge better than the gradient descent algorithm. Chapter 5 illustrates and explains the difference in results between the two methods. Dynamic programming does better as it has the ability to restrict the detection of local minima within a localized region of the location of the snaxel. The energy function, E(x), as in Equation 419 can be minimized by changing the variable by a small value 6(x). Here x represents in the (x, y) coordinate system. x < x + Sx By linear approximation, an expression for the new energy can be obtained, as expressed in Equation 420. OE E(x + &) E(x) + . & ax (420) Hence, a decrease in the value of 6x reduces or minimizes the energy.to 9 E &oa Ox Thus, the energy function is modified as follows: E(x + x)E(x) ( )2 Ox (421) The second parameter in Equation 421, with its negative sign and the fact that the result is squared, makes certain that the E function will decrease upon each iteration, until a minimum is reached. Section 4.2 further illustrates and explains the implementation of Snakes using dynamic programming. In Section 4.2, principle of dynamic programming is explained, with an illustration of a capital budgeting problem in Section 4.2.1 that is an example of dynamic programming implementation, and in Section 4.2.2, the implementation of dynamic programming to minimize the energy of a Snake is explained. 4.2.1 Dynamic Programming for Snake Energy Minimization Dynamic programming determines a minimum, by using search technique within given constraints. This process is a discrete, multistage, decision process. Dynamic programming when applied to the project of minimizing the energy of a Snake, or deformable contour, would include the location of snaxels, or pixels, as stages. Here the decision to relocate a snaxel to a new location, to minimize the energy, is performed by restricting the movement of the snaxel to a window around its present location. Section 4.2.1 explains the concept of dynamic programming using an illustration (Trick, 1997). This section gives an understanding of the principle of dynamic programming leading us to its implementation in Section 4.2.2. Section 4.2.3 explains the implementation of dynamic programming in Snakes to minimize the total energy of the Snake so as to optimally orient the Snake close to the desired edge in the image. 4.2.2 Dynamic Programming This section explains the principle of dynamic programming, with a capital budgeting problem as an example. In this demonstration, the objective is to maximize the firm's revenue from the allocated fund. Problem definition. A corporation has $5 million to allocate to its three plants for possible expansion. Each plant has submitted a number of proposals on how it intends to spend the money. Each proposal gives a cost of expansion (c) and the total revenue expected (r). The following table gives the proposals generated. Table 41. Proposals Plant 1 Plant 2 Plant 3 Proposal C1 R1 C2 R2 C3 R3 1 0 0 0 0 0 0 2 1 5 2 8 1 4 3 2 6 3 9   4   4 12   Solution. There is a straightforward approach to solve this problem, but it is computationally infeasible. Here, the dynamic programming approach is used to solve this capital budgeting problem. It is assumed in this problem that if the allocated money is not spent, it will be lost; hence, the objective is to utilize the allocated amount. The problem is split into three stages, each stage representing the money allocated to each plant. Thus, stage 1 represents money allocated to Plant 1, stage 2 and 3 representing money allocated to Plants 2 and 3 respectively. In this approach the order of allocation is first set to Plant 1 and then to Plant 2 and 3 respectively. Each stage is further divided into states. A state includes the information required to go from one stage to the next. In this case the states for stages 1, 2 and 3 are as follows. {0, 1, 2, 3, 4, 5}: Amount of money spent on Plant 1, as xl, {0, 1, 2, 3, 4, 5}: Amount of money spent on Plants 1 and 2, as x2, and {5}: Amount of money spent on Plants 1, 2 and 3 as x3 Thus, each stage is associated with revenue, and to make a decision at Stage 3, only the amount spent on Plants 1 and 2 needs to be known. As can be seen from the states above, in states xl and x2, we have a set of options for the amount that can be invested. Whereas in the case of state x3 we only have an option of 5, as the total amount invested in the Plants 1, 2 and 3 must be equal to $5 million. Since we cannot spend above it, nor can we spend less than $5 million dollars, if we do not spend the allocated amount, it will be lost as per problem definition. Table 42. Stage 1 computation If Capital Available (xi) Then Optimal Proposal And revenue for Stage 1 0 1 0 1 2 5 2 3 6 3 3 6 4 3 6 5 3 6 The following computation at each stage illustrates the working principle of dynamic programming. In Table 42, we have an option to select from the set of capital available (xl), for an optimal proposal, and the revenue from them invested in Plant 1, inferred from Table 41. Further, this process evaluates the best solution for Plants 1 and 2 in Stage 2, with a number of predefined options for states being represented by x2. At Stage 2, to calculate the best revenue for a given state x2, this process goes through all the Plant 2 proposals, and allocates the amount of funds to Plant 2 and then the remainder of the amount is optimally used for Plant 1, based on the information from Table 42. This example further illustrates the above discussion: suppose the best allocation for the state x2 = 4, then in Stage 2, one of the following proposals could be implemented. From Table 41, if we select a particular proposal for Plant 2 in Stage 2, the remainder of the amount invested from Stage 2 is utilized for Plant 1. Table 4.3 below illustrates the total revenue based on the combination of the proposals for Plants 1 and 2. Table 43. Proposal revenue combination If Plant 2 Then Plant 2 Then Funds Maximum revenue Total revenue proposal Revenue remaining From Stage 1 from Plant 1 For Stage 1 and 2 1 0 4 6 6 2 8 2 6 14 3 9 1 5 14 4 12 0 0 12 Thus, the best proposal to be selected for Plants 1 and 2, would be either proposal 1 for Plant 2 and proposal 2 for Plant 1, returning a revenue of 14, or proposal 2 for Plant 2 and proposal 1 for Plant 1, also returning a revenue of 14. Further Table 4.4 illustrates the set of options available for state x2 in Stage 2, with the corresponding optimal proposals for each of the options, and the total revenue return from Stages 1 and 2. Below, Stage 3 is considered, with only one option for the state x3 = 5. Table 44. Stage 2 computation If Capital Available (x2) Then Optimal Proposal Revenue for Stage 1 and 2 0 1 0 1 1 5 2 2 8 3 2 13 4 2 or3 14 5 4 17 Along the same lines, computations are carried out for Stage 3, but here the capital available would be x3 = 5. Once again, the process goes through all the proposals for this stage, determines the amount of money remaining, and uses Table 4.3 to decide the previous stages. From Table 4.1, for Plant 3, there are only two proposals, where: * Proposal 1 gives revenue 0, and leaves 5. From Table 4.3 the previous stages give 17, hence a total revenue of 17 is generated. * Proposal 2 gives revenue 4, and leaves 4. And from Table 4.3 the previous stage gives 14, hence a total revenue of 18 is generated. Hence, the optimal solution would be to implement proposal 2 at Plant 3, proposal 2 or 3 at Plant 2, and proposal 3 or 2 (respectively) at Plant 1. Each option gives revenue of 18. Thus, the above example illustrates the recursive procedure of this approach. As per this method, at any particular state, all the decisions for the future are made independently of how the particular state is reached. This is the principal of optimality, and dynamic programming rests on this assumption (Trick, 1997). The following formula is used to perform the dynamic programming calculation. If r(kj) is the revenue for proposal kj at Stage j, and c(kj) the corresponding cost, then let fj(xj) be the revenue of the state xj in Stage j. Then max fl(xl) = kJ:c(kJ) and max fj(xj) = k:c(k) This formula is used to compute the revenue function in the forward procedure; it is also possible to compute it in a backward procedure that gives the same result. Using the same principle, dynamic programming is implemented in the next section for the energy minimization of Snakes. 4.2.3 Dynamic Snake Implementation In an analogous way to the implementation in Section 4.2.2, the snaxels, point locations along the Snake, are relocated in the deformable model based on the energy minimization procedure. This is done in a similar fashion to the states of revenue in Stage 1, as was explained in Section 4.2.2. In Section 4.2.2, the revenue was restricted to a maximum of $5 million, in Snakes, the movement of a snaxel is restricted to a search window around its current position. The objective of the process is to minimize the total energy, by minimizing the energy at each stage ( i.e., at each snaxel location in the model). Figure 45, illustrates the movement a snaxel in its vicinity search window, and the changing orientation of the Snake. Here, each snaxel is analogous to a stage and the positions in the search window represent the states. At any snaxel position, the energy is given by the sum of the energies at the preceding position and the current snaxel position. The minimal sum of these energies is retained as the optimal value. The process continues through all the snaxels, and at the end of each iteration of the minimization process, the points, or snaxels, move toward those new locations that generated the optimal path in each of their neighborhoods, as is in Figure 45. Here the optimal path would be equivalent to the minimization of the total energy. Neighborhood Window Figure 45. Dynamic snake movement. The total energy of the snake would be given by: E (vO, vl, vn ) = E1(vl, v2) + E2(v2, v3) +.... En1(vn2,vn1) (424) Where, each variable "v", or snaxel, is allowed to take "m" possible locations, generally corresponding to adjacent snaxel locations within a search neighborhood. Each new snaxel location, "vi", corresponding to the state variable in the ith decision stage, is obtained by dynamic programming as follows. A sequence of functions is generated, {si} for i = [1, n1], an optimal value function. This function, Si for each stage (i.e., snaxel) is obtained by a minimization performed over vi. To minimize Equation 424 with n = 5, there would be minimization of the state variable for each of the n snaxel locations. This would require minimizing the sum of the energy between the snaxel location under consideration and its preceding location, as in the illustration in Section 4.2.2. Hence for n = 5, we would have to following energy minimization at each stage. sl (v2) =min vl {El (vl, v2)} s2 (v3) = min v2 {sl (v2) + E2 (v2, v3)} .......s4 (v5) = min v4 {s3 (v4) + E4 (v4, v5)} Min over vl ...v5 of E = min v5 {s4 (v5)} Thus in general mmn sk(vk+l) = vk (skl(vk) + Ek(vk,vk+l)) (425) Considering Equation 425, with "k" representing the stage and "vk" being the states, the recurrence relation used to compute the optimal function for the deformable contour is given by: mmn sk(vk+l) =vk { sk1(vk) + Eext(vk) + vk+1 vk2 (426) Assuming that the possible states, or new locations, of snaxels is in a 3x3 window around the current snaxel location, then there are nine possible states per stage (i.e., snaxel location). In this case, the cost associated with each of these possible states is equivalent to the internal energy of the snaxel at those locations. The objective is to minimize this energy over the n snaxel points using Equation 426. The optimal deformable contour is successively obtained through an iterative process, until Emin(t) does not change with time. This approach is significant as it enforces constraints on the movement of the Snake; this is not possible in the gradient descent algorithm approach toward the minimization of Snake energy. Hence, we get better results using the dynamic programming approach, than we do with the gradient descent algorithm. Chapter 5 explains the overall extraction process using anisotropic diffusion and dynamic Snakes. CHAPTER 5 METHOD OF EXTRACTION Numerous methods exist to extract road features from an aerial image. Most of the feature extraction methods that are developed are implemented using a combination of image processing techniques, from various levels of an image processing system. Road representation varies from image to image depending on the resolution of the image, the weather conditions prevailing at the time of photograph, and the sun's position, as was discussed in Chapter 2. Hence, it is very difficult to have a common method to extract the roads from any image. To overcome the hurdle of implementing new methods to identify and extract road features from any aerial image, depending on the nature of the image, research in the recent past was targeted toward developing a global method, using a combination of image processing techniques to extract road features from any aerial image. Our study has developed a feature extraction method that could be implemented as an initial road extraction method in a global model, or as an independent semiautomatic road extraction method. This method evolved through stages, with implementation of the PeronaMalik algorithm and Snakes (Deformable Contour Models) using dynamic programming. Section 5.3 explains in detail this stage based on evolution of the method over stages in the process of development, and is followed by a detailed explanation of the implemented method of road feature extraction. 5.1 Technique Selection A generic feature extraction method is a threestep process involving: pre processing, edge detection and feature extraction. The roadedge feature extraction method developed in our study evolved over stages, and implements a combination of image processing techniques at each stage. At each stage, a method developed during research was inspected, and then evaluated, based on its ability to extract roadedge features from an aerial image. The roads extracted at each stage were visually inspected and compared to the desired road edge location in the image. Methods were developed in stages, using a combination of image processing techniques, until the extract roads were close to the desired or actual road edges in the aerial image, based on visual inspection and comparison of the results. Table 51. Stages of development Stages Stage 1 Stage 2 Stage 3 Stage 4 Steps Pre PeronaMalik Proe i Gaussian Gaussian Gaussian Agori Processing Algorithm Edge Sobel Sobel Sobel PeronaMalik Detection Algorithm Feature Hough Gradient Dynamic Dynamic Extraction Transform Snake Snake Snake Stages of development in Table 51 lists in brief various imageprocessing techniques implemented over stages to develop a method of extraction. Road edges extracted at Stage 4 (Table 51), gave results close to the desired road edge locations in the image. The method developed in Stage 4 involved the PeronaMalik Algorithm (Chapter 3), and Dynamic Snakes (Chapter 4) implementation. Results obtained using Stage 3 and Stage 4 were pretty close to the desired road edges upon 