Classification of Building Infrastructure and Automatic Building Footprint Delineation Using Airborne Laser Swath Mappin...

Permanent Link: http://ufdc.ufl.edu/UFE0023693/00001

Material Information

Title: Classification of Building Infrastructure and Automatic Building Footprint Delineation Using Airborne Laser Swath Mapping Data
Physical Description: 1 online resource (215 p.)
Language: english
Creator: Caceres, Jhon
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008


Subjects / Keywords: alsm, building, classification, footprint, lidar, mapping, modelling, urban
Civil and Coastal Engineering -- Dissertations, Academic -- UF
Genre: Civil Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation


Abstract: Three-dimensional (3D) models of urban infrastructure comprise critical data for planners working on problems in wireless communications, environmental monitoring, civil engineering, and urban planning, among other tasks. Photogrammetric methods have been the most common approach to date to extract building models. However, Airborne Laser Swath Mapping (ALSM) observations offer a competitive alternative because they overcome some of the ambiguities that arise when trying to extract 3D information from 2D images. Regardless of the source data, the building extraction process requires segmentation and classification of the data and building identification. In this work, approaches for classifying ALSM data, separating building and tree points, and delineating ALSM footprints from the classified data are described. Digital aerial photographs are used in some cases to verify results, but the objective of this work is to develop methods that can work on ALSM data alone. A robust approach for separating tree and building points in ALSM data is presented. The method is based on supervised learning of the classes (tree vs. building) in a high dimensional feature space that yields good class separability. Features used for classification are based on the generation of local mappings, from three-dimensional space to two-dimensional space, known as ?spin images? for each ALSM point to be classified. The method discriminates ALSM returns in compact spaces and even where the classes are very close together or overlapping spatially. A modified algorithm of the Hough Transform is used to orient the spin images, and the spin image parameters are specified such that the mutual information between the spin image pixel values and class labels is maximized. This new approach to ALSM classification allows us to fully exploit the 3D point information in the ALSM data while still achieving good class separability, which has been a difficult trade-off in the past. Supported by the spin image analysis for obtaining an initial classification, an automatic approach for delineating accurate building footprints is presented. The physical fact that laser pulses that happen to strike building edges can produce very different 1st and last return elevations has been long recognized. However, in older generation ALSM systems ( < 50 kHz pulse rates) such points were too few and far between to delineate building footprints precisely. Furthermore, without the robust separation of nearby trees and vegetation from the buildings, simply extracting ALSM shots where the elevation of the first return was much higher than the elevation of the last return, was not a reliable means of identifying building footprints. However, with the advent of ALSM systems with pulse rates in excess of 100 kHz, and by using spin-imaged based segmentation, it is now possible to extract building edges from the point cloud. A refined classification resulting from incorporating ?on-edge? information is developed for obtaining quadrangular footprints. The footprint fitting process involves line generalization, least squares-based clustering and dominant points finding for segmenting individual building edges. In addition, an algorithm for fitting complex footprints using the segmented edges and data inside footprints is also proposed.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Jhon Caceres.
Thesis: Thesis (Ph.D.)--University of Florida, 2008.
Local: Adviser: Shrestha, Ramesh L.
Local: Co-adviser: Slatton, Kenneth C.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0023693:00001

Permanent Link: http://ufdc.ufl.edu/UFE0023693/00001

Material Information

Title: Classification of Building Infrastructure and Automatic Building Footprint Delineation Using Airborne Laser Swath Mapping Data
Physical Description: 1 online resource (215 p.)
Language: english
Creator: Caceres, Jhon
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008


Subjects / Keywords: alsm, building, classification, footprint, lidar, mapping, modelling, urban
Civil and Coastal Engineering -- Dissertations, Academic -- UF
Genre: Civil Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation


Abstract: Three-dimensional (3D) models of urban infrastructure comprise critical data for planners working on problems in wireless communications, environmental monitoring, civil engineering, and urban planning, among other tasks. Photogrammetric methods have been the most common approach to date to extract building models. However, Airborne Laser Swath Mapping (ALSM) observations offer a competitive alternative because they overcome some of the ambiguities that arise when trying to extract 3D information from 2D images. Regardless of the source data, the building extraction process requires segmentation and classification of the data and building identification. In this work, approaches for classifying ALSM data, separating building and tree points, and delineating ALSM footprints from the classified data are described. Digital aerial photographs are used in some cases to verify results, but the objective of this work is to develop methods that can work on ALSM data alone. A robust approach for separating tree and building points in ALSM data is presented. The method is based on supervised learning of the classes (tree vs. building) in a high dimensional feature space that yields good class separability. Features used for classification are based on the generation of local mappings, from three-dimensional space to two-dimensional space, known as ?spin images? for each ALSM point to be classified. The method discriminates ALSM returns in compact spaces and even where the classes are very close together or overlapping spatially. A modified algorithm of the Hough Transform is used to orient the spin images, and the spin image parameters are specified such that the mutual information between the spin image pixel values and class labels is maximized. This new approach to ALSM classification allows us to fully exploit the 3D point information in the ALSM data while still achieving good class separability, which has been a difficult trade-off in the past. Supported by the spin image analysis for obtaining an initial classification, an automatic approach for delineating accurate building footprints is presented. The physical fact that laser pulses that happen to strike building edges can produce very different 1st and last return elevations has been long recognized. However, in older generation ALSM systems ( < 50 kHz pulse rates) such points were too few and far between to delineate building footprints precisely. Furthermore, without the robust separation of nearby trees and vegetation from the buildings, simply extracting ALSM shots where the elevation of the first return was much higher than the elevation of the last return, was not a reliable means of identifying building footprints. However, with the advent of ALSM systems with pulse rates in excess of 100 kHz, and by using spin-imaged based segmentation, it is now possible to extract building edges from the point cloud. A refined classification resulting from incorporating ?on-edge? information is developed for obtaining quadrangular footprints. The footprint fitting process involves line generalization, least squares-based clustering and dominant points finding for segmenting individual building edges. In addition, an algorithm for fitting complex footprints using the segmented edges and data inside footprints is also proposed.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Jhon Caceres.
Thesis: Thesis (Ph.D.)--University of Florida, 2008.
Local: Adviser: Shrestha, Ramesh L.
Local: Co-adviser: Slatton, Kenneth C.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0023693:00001

This item has the following downloads:

Full Text




2 2008 Jhon Caceres


3 To my Mom


4 ACKNOWLEDGMENTS I would like to thank my advisor Dr. Ramesh Shrestha and my co-advisor Dr. Clint Slatton for their guidance and constant support through my academic experience in Florida. I also would like to thank Dr. Bill Carter for his advice and fo r letting me. To all of them, I would like to express my gratitude for all their efforts that allowed me to learn a lot about Lidar technology. I would like to thank Dr. Jasmeet Judge and Dr. Alper Ungor, members of my committee, for their comments and the knowledge I could obtain by being in their classes. Finally, I would like to thank my colleagues in our research grou p and Paula and Lucas for being my friends in Gainesville.


5 TABLE OF CONTENTS page ACKNOWLEDGMENTS...............................................................................................................4 LIST OF TABLES................................................................................................................. ..........8 LIST OF FIGURES................................................................................................................ .........9 ABSTRACT....................................................................................................................... ............13 CHAPTER 1 INTRODUCTION................................................................................................................. .15 1.1 Classification and Modeling Strategies.........................................................................17 1.1.1 Classification Strategies....................................................................................17 1.1.2 Modeling Strategies..........................................................................................20 1.2 Contributions and Proposal Outline..............................................................................22 2 AIRBORNE LASER SWATH MAPPING AND THEORETICAL BACKGROUND.........26 2.1 Airborne Laser Swath Mapping Background...............................................................26 2.1.1 Parameters Affecti ng the Scanning Pattern.......................................................29 2.1.2 Laser Pulse Characterization and With-Target Interaction...............................31 2.2 ALSM System Description...........................................................................................33 2.3 Theoretical Background................................................................................................35 2.3.1 Pattern Recognition an d Computer Vision.......................................................35 2.3.2 Bayesian Classification.....................................................................................36 Pdf definition by Parzen windowing..................................................37 Pdf definition by k-nearest neighbor..................................................38 2.3.3 Clustering..........................................................................................................38 2.3.4 Neighborhood Systems.....................................................................................40 2.3.5 Spin Images.......................................................................................................42 3 CLASSIFICATION OF BUILDING INFR ASTRUCTURE IN AIRBORNE LIDAR POINTS USING SPIN IMAGES...........................................................................................43 3.1 Background................................................................................................................. ..44 3.2 DataAcquisition and Pre-Processing.............................................................................47 3.3 Classification Approach................................................................................................49 3.3.1 Point Feature Computation through Spin Images.............................................50 3.3.2 Tangent Plane Fitting through a Mo dified Hough Transform Algorithm.........54 3.3.3 Training.............................................................................................................61 3.3.4 Classification.....................................................................................................63 3.4 Sensitivity Analysis....................................................................................................... 64 3.4.1 Spin Image Parameter Selection by Using Mutual Information.......................66


6 3.4.2 Sensitivity Analysis Results..............................................................................72 3.5 Algorithm Comple xity Analysis...................................................................................80 3.5.1 Spin Image Computation...................................................................................80 3.5.2 K-Nearest Neighbor Search..............................................................................81 3.5.3 Classification.....................................................................................................81 3.5.4 Complexity in Time..........................................................................................82 3.6 Results.................................................................................................................... .......82 3.6.1 Classification of Suburban Data........................................................................83 3.6.2 Campus Data Classification..............................................................................86 4 IMPROVED CLASIFICATION OF BUILDING INFRASTRUCTURE FROM AIRBORNE LIDAR DATA USING SPIN IM AGES AND FUSION WITH GROUND BASED LIDAR.................................................................................................................... 100 4-1 Review on Processing Methods for Ground-Based Scanner Data..............................100 4.2 Data Acquisition and Pre-Processing..........................................................................101 4.3 Classification Results..................................................................................................103 4.3.1 Classification of ALSM Data..........................................................................105 4.3.2 Classification of M-TLS Data.........................................................................105 4.3.3 Classification of Fused Data...........................................................................105 4.3.4 Roof Classification..........................................................................................110 5 AUTOMATIC BUILDING FOOTPRINT DELINEATION FROM ON-EDGE RETURN IDENTIFICATION IN LIDAR DATA...............................................................114 5.1 Introduction............................................................................................................... ..114 5.2 Literature Review........................................................................................................11 4 5.2.1 Photograph-Based Techniques........................................................................114 5.2.2 Methods based on data fusion.........................................................................117 5.2.3 Lidar-Alone-Based methods...........................................................................118 5.3 Method..................................................................................................................... .......121 5.3.1 Scanning Process Simulation..........................................................................123 5.3.2 System Setting.................................................................................................126 5.3.3 Expected Footprint Accuracy..........................................................................128 Photogrammetric accuracy...............................................................130 Lidar-based accuracy........................................................................132 5.3.4 Automation......................................................................................................137 5.4 Method Steps............................................................................................................... 138 5.4.1 System Setting and Data Capture....................................................................139 5.4.2 Pre-Processing.................................................................................................140 5.4.3 Classification...................................................................................................146 Ground/non-ground classification....................................................146 Building/non-building and e dge/non-edge classification.................149 5.4.4 Modeling.........................................................................................................152 Individual building segmenta tion by applying local adaptive majority filters..................................................................................153


7 Edge segmentation by point clustering and return-under attribute knowledge.........................................................................................156 Footprint fitting.................................................................................168 5.5 Results.................................................................................................................... .....174 5.5.1 Footprint Delineation Results.........................................................................174 5.5.2 Accuracy Results.............................................................................................179 5.5.2 Comparison with Other F ootprint Delineation Methods................................181 5.5.3 Enhanced Building Description th rough Segmentation Intermediate Results.............................................................................................................187 6 CONCLUSIONS AND FUTURE WORK...........................................................................191 6.1 Classification of Building Infrastructure in Airborne Lidar Points Using Spin Images......................................................................................................................... 191 6.2 Improved Classification of Building Infr astructure from Airborne Lidar Data Using Spin Images and Fusion with Ground Based Lidar..........................................193 6.3 Automatic Building Footprint Delineation from On-edge Return Identification in Lidar Data...................................................................................................................19 5 APPENDIX PERPENDICULAR DISTANCE LEAST SQUARES LINE FITTING................197 LIST OF REFERENCES............................................................................................................. 209 BIOGRAPHICAL SKETCH.......................................................................................................215


8 LIST OF TABLES Table page 3-1 Correlation of mutual information (MI) values and classification accuracy rates (correctness percentage) for paramete r sets at 2 angle resolution....................................74 3-2 Classification accuracy rates of campus da ta points at R=6m, bZ=0.50m and aR=2......91 3-3 Classification accuracy rates of campus da ta points at R=6m, bZ=0.25m and aR=2......97 3-4 Classification accuracy rates of campus data points for combined classifier....................98 4-1 Classification accuracy percentages as a f unction of training data sets and fused data combinations................................................................................................................... .107 5-1 Probability of a laser puls e hitting a edge for different ASLM system configurations...128 5-2 RMSE for extracted building corners with respect to GPS measurements.....................180


9 LIST OF FIGURES Figure page 1-1 Scan line over an urban landscape.....................................................................................19 1-2 Simplified photogrammetric production line.....................................................................21 2-1 ALSM system components and scanning process.............................................................28 2-2 Snapshot of an ALSM system at scanning........................................................................30 2-3 Pulse Characteristics...................................................................................................... ....32 2-4 ALSM systems at UF......................................................................................................... 34 2-5 Neighborhood systems for ALSM points..........................................................................42 3-1 ALSM Scan Line boundaries of the data take n in Gainesville and at the University of Florida Campus................................................................................................................. .48 3-2 Perspective view of a surface built by using preprocessed ALSM data............................49 3-3 Spin image generation...................................................................................................... ..53 3-4 Points in feature space and its corresponding mapping in the Hough transform parameter space................................................................................................................ ..58 3-5 Suburban Gainesville area for cl assification of ALSM data (SW 34th St. and SW 2nd Ave.).......................................................................................................................... .........73 3-6 Correlation charts between mutual inform ation (MI) values and classification rates.......77 3-7 Classification results for bZ=0.25 m, R=4 m, and aR=2..................................................79 3-8 Suburban Gainesville area for cl assification of ALSM data (SW 34th St. between NW 6th Ave. and NW 7th Ave.)..........................................................................................83 3-9 Classification results for bZ=0.25 m, R=6 m, and aR=2..................................................84 3-10 Classified points nearby a partially occluded house roof..................................................87 3-11 UF Campus area for classification of ALSM data.............................................................89 3-12 Classification of area enclosing UF buildings...................................................................90 3-13 Classified points on a complex building (Weimer Hall)...................................................93 3-14 Classified points on a complex building (Weil Hall).........................................................95


10 3-15 Classification of points on flat and gabled roofs in Weil Hall...........................................99 4-1 Weil Hall at the University of Florida.............................................................................103 4-2 Point cloud of fused ALSM and M-TLS data corresponding to area enclosed by dahed line in Figure 4-1...................................................................................................103 4-3 Training areas in Weil Hall..............................................................................................10 4 4-4 Classification results for ALSM data...............................................................................106 4-5 Classification results fo r M-TLS wall points only...........................................................106 4-6 Classification results for Experiment 1 superimposed on architectural building footprints of Weil Hall and surrounding buildings..........................................................108 4-7 Classification results for Experiment 4 (perspective view from north-west side)...........109 4-8 Classification results for Experiment 5 (perspective view from north-west side)...........110 4-9 Flat roofs on Weil Ha ll and a nearby building.................................................................111 4-10 Roof points from the enclosed areas in Figure 9.............................................................112 4-11 Classification of angled and flat roofs over Weil Hall....................................................112 5-1 High resolution location of laser footprints on roofs.......................................................123 5-2 GUI for ALSM scanning simulator.................................................................................124 5-3 Detected pulse-edge inters ections for two simulation runs.............................................126 5-4 Building scanned by parallel flight lines.........................................................................127 5-5 Relationship between height of the fl ight in a photogrammetric mission and the accuracy of the maps extracted from the photographic processing.................................133 5-6 Location of a laser footprin t with respect a target edge...................................................135 5-7 Relationship between height of the fl ight of a Lidar mission and the expected accuracy of the laser returns............................................................................................136 5-8 Relationship between height of the fl ight of a Lidar mission and the expected accuracy of the footprin t corner coordinates...................................................................137 5-9 ALSM Scan Line boundaries of the data ta ken at the University of Florida Campus.....141 5-10 Lidar points color coded by elevation..............................................................................142


11 5-11 Laser pulse hitti ng different surfaces...............................................................................143 5-12 Lidar points classified by re turn number (Classes 2 and 3).............................................144 5-13 Lidar points classified by re turn-under number (all classes)...........................................145 5-14 Lidar points classified by retu rn-under number (all classes but 1)..................................145 5-15 Misleading returns........................................................................................................ ...146 5-16 Classification and modeling steps....................................................................................147 5-17 Results from classification............................................................................................... 148 5-18 Non-ground points classified by return-under number....................................................148 5-19 Training areas for bu ilding classification........................................................................150 5-20 Segmentation after majority filter....................................................................................151 5-21 Building roof points classi fied by return-under number..................................................152 5-22 Building points color coded by build ing part identifiers (top view)................................155 5-23 Building points color coded by buildi ng part identifiers (oblique view).........................155 5-24 Building points color code d by building identifiers........................................................155 5-25 Clustered points from different edges..............................................................................158 5-26 Initial building footprint................................................................................................ ...160 5-27 Initial coarse edge segmentation......................................................................................163 5-28 Final clustering of points on an edge from the initial co arse segmentation.....................164 5-29 Final segmentation of on-edge points..............................................................................165 5-30 On-edge and non-on-edge points on Weil hall................................................................166 5-31 Convolution mask for identifying boundary cells...........................................................167 5-32 Convolution mask for identifying bo undary cells and edge direction.............................169 5-33 Gridded boundary for the bu ilding part in Figure 5-30...................................................169 5-34 Representative cell points on Weil Hall building............................................................170 5-35 Initial coarse edge segmentation......................................................................................170


12 5-36 Final segmentation of on-edge points..............................................................................170 5-37 Footprint first adjustment of the point segments shown in Figure 5-36..........................173 5-38 Footprint final adjustment of the point segments shown in Figure 5-36.........................173 5-39 Footprint for a complex building roof.............................................................................174 5-40 Building footprints at UF area.........................................................................................175 5-41 Building footprint parts at UF area. Bu ilding parts are shown for complex roofs..........176 5-42 Footprint of the Bryant Space Science Center at the University of Florida....................177 5-43 Footprint of Dahuer Hall at the University of Florida.....................................................178 5-44 Footprint of the Computer Science and Engineering build ing and the Marston Science Library at the Un iversity of Florida....................................................................178 5-45 Location of GPS points for accuracy testing...................................................................179 5-46 Definition of boundary points in the convex-hull based algorithm.................................183 5-47 Dominant points a nd adjusted lines.................................................................................183 5-48 Dominant points on gridded data.....................................................................................184 5-49 Dominant points for diffe rent processing algorithms......................................................186 5-50 Line and footprint adjustment for th e Weil hall roof using different methods................187 5-51 Weil Hall at UF Campus..................................................................................................18 8 5-52 Slope magnitude of Weil hall roof surfaces.....................................................................188 5-53 Slope aspect of Weil hall roof surfaces............................................................................189


13 Abstract of Dissertation Pres ented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy CLASSIFICATION OF BUILDING INFRAS TRUCTURE AND AUTOMATIC BUILDING FOOTPRINT DELINEATION USING AI RBORNE LASER SWATH MAPPING DATA By Jhon Caceres December 2008 Chair: Ramesh Shrestha Cochair: K. Clint Slatton Major: Civil Engineering Three-dimensional (3D) models of urban infrastr ucture comprise critical data for planners working on problems in wireless communications, environmental monitoring, civil engineering, and urban planning, among other tasks. Photogrammetric methods have been the most common approach to date to extract building models. However, Airborne Laser Swath Mapping (ALSM) observations offer a competitive alternative because they overcome some of the ambiguities that arise when trying to extract 3D information from 2D images. Regardless of the source data, the building extraction process requi res segmentation and classification of the data and bu ilding identification. In this work, approaches for classifying ALSM data, separating building and tree points, and delineating ALSM footprints from the classified data are described. Di gital aerial photographs are used in some cases to verify results, but the objective of th is work is to develop methods th at can work on ALSM data alone. A robust approach for separating tree and build ing points in ALSM data is presented. The method is based on supervised learning of the cl asses (tree vs. building) in a high dimensional feature space that yields good cla ss separability. Features used fo r classification are based on the generation of local mappings, from three-dime nsional space to two-dimensional space, known as


14 “spin images” for each ALSM point to be classi fied. The method discriminates ALSM returns in compact spaces and even where the classes are ve ry close together or overlapping spatially. A modified algorithm of the Hough Transform is used to orient the spin images, and the spin image parameters are specified such that the mutual information between the spin image pixel values and class labels is maximized. This new approach to ALSM classification allows us to fully exploit the 3D point information in the ALSM data while still achievi ng good class separability, which has been a difficult trade-off in the past. Supported by the spin image analysis for obtai ning an initial classi fication, an automatic approach for delineating accurate building footprin ts is presented. The physical fact that laser pulses that happen to strike buildi ng edges can produce very different 1st and last return elevations has been long recognized. However, in older generation ALSM systems (<50 kHz pulse rates) such points were t oo few and far between to delineat e building footprints precisely. Furthermore, without the robust separation of nearby trees and vegeta tion from the buildings, simply extracting ALSM shots where the elevation of the first return was much higher than the elevation of the last return, wa s not a reliable means of identif ying building footprints. However, with the advent of ALSM systems with pulse rates in excess of 100 kHz, and by using spinimaged based segmentation, it is now possible to extract building edge s from the point cloud. A refined classification resulting from incorpora ting “on-edge” information is developed for obtaining quadrangular footprints. The footprint f itting process involves line generalization, least squares-based clustering and dom inant points finding for segmenting individual building edges. In addition, an algorithm for fitting complex foot prints using the segmented edges and data inside footprints is also proposed.


15 CHAPTER 1 INTRODUCTION Over the last 5 to 10 years, Airborne Laser Swath Mapping (ALSM) technology, also known simply as airborne LiDAR (light detection and range), has become widely available to the remote sensing research community, and many fi elds of application have been proposed. The majority of the early successful applications were for terrain mapping and surveying, but the high resolution data delivered by th is technology has allowed rese archers to apply it to problems in forestry, civil engineering, ur ban planning, and many other fields. Modern ALSM instruments can routinely provi de data point densit ies of 1-10 points per square meter, depending on the system configura tion, flying speed and altitude of the aircraft, and overlap of adjacent swaths. The position accuracy of the points is roughly 15-20 cm in x and y dimension and 5 – 10 cm in z (height) dimension (Shrestha et. al. 1999), which is sufficient for terrain mapping and general build ing detection for urban planni ng. Explicit three-dimensional (3D) building models are an important input for environmental monitoring and civil engineering in urban settings (Shiode, 2000), and for such an cillary applications as estimating microwave line-of-sight communications a nd emergency responder plans. Urban models have been obtained through the pr ocessing of several da ta sources, including photogrammetry (metric photograph) and remote sensing imagery. The processing required depends on the data type and information sought. In general, when photogrammetry is used for urban modeling, the positional accuracy of objec ts is excellent, but because much of the processing must be done manually, the production costs and time delays are both large. In comparison, remote sensing technology can ameliorate costs. Because of the high degree of computer auto mation ALSM observations can be processed relatively quickly and yield high positional accuracy, making ALSM data well suited for urban


16 modeling. From the ALSM data, represented by a three-dimensional point cloud, digital terrain models (DTM) of the bare Earth can be obtained with accuracies comparable to, or even better than from photogrammetric methods (Smith, 2003), with significantly less tedious and expensive processing. If the ALSM observations can be prop erly segmented into building and non-building classes, high quality urban m odels can be generated virtuall y “automatically” using personal computers and commercially available software p ackages. However, today some issues remain regarding the reliability of cl assification and extraction of th e building and tree points from ALSM point clouds. Classifying ALSM observations into building and non-building classes is difficult in areas where buildings are located near by trees or when trees, vegetati on and other non-building objects actually overlap building points. Complex, non-rectangular, non-flat roofs also make precise building segmentation difficult. The main cause for this is that the methods derive not true 3D representations (surfaces acc ounting or even summarizing one unique height value for a horizontal spot or cell, usually represented by matrices and know n as 2.5D surfaces) and vertical profiles of the scanned area from the ALSM data without consider ing the implicit three dimensional geometric information embedded in the data. When overlapping tree branches over building roofs are scanned, the ALSM points obtain ed are located at the same or very close plannimetric coordinates but diffe rent heights. The 2.5D represen tation of this data fuses the points as a unique 3D point resulting in lost data. When the data is classified, building models are extracted by deriving their footprints and fitting geometric models to their roofs. Although slope and orientation of the faces and surfaces making up the roofs can be calculated with high accuracy, the delimitation of the roofs given by their footprints have not been computed with similar accuracy, because the ALSM data lacks


17 information about the specific location at whic h the laser pulse hits any given object—only approximate estimates about the locati ons of building edges can be made. The research proposed here utilizes novel a pproaches to identify and classify non-ground ALSM data and to delineate build ing footprints using the classifi ed data. For data classification, the approach uses three dimensional point cl ouds where each laser return is analyzed and classified according to knowle dge acquired in a “machine-lear ning” process. For footprint delimitation, clues obtained from previous classification and the da ta capture process are used to directly identify building edges and accurately determine thei r location, by detecting “edgehitting” laser pulses. Below, a short review of the main processing st rategies used in algorithms for classifying and modeling is given. Then, the main aspects of the proposed research approach, highlighting its novel contribution, are describe d, along with the proposal outline. 1.1 Classification and Modeling Strategies 1.1.1 Classification Strategies As part of the urban modeling process, classi fication of ALSM data can be seen as the necessary preparatory step to the final modeli ng. During the classificatio n process the spatial locations and sometimes spectral values of laser points are analyzed and processed to separate building points (or data cell when the data is in grid form) from their non-ground counterparts. To do this, spectral and geometric information, a nd additional derived feat ures related to each ALSM point or cell are analyzed using seve ral strategies, including most commonly edgedetection, region growing, cluste ring, and vertical profiling. Early classification efforts re lied on the use of digital image processing techniques over digital surfaces models (DSM). But DSMs are can be thought of as simply a gridded form of unevenly spaced raw ALSM observations, making it na tural to use existing techniques and in the


18 analysis of ALSM data. For example, edge de tection through gradient and surface normal computations has been well established as a useful approach to separating regular shapes such as building rectangular footprints from irregular sh apes related to vegetation. Haithcoat et. al. (2001) and Alharty and Bethel ( 2002) use this tool for their building modeling works—the most difficult issue being the definition of appropriate threshold values. Another technique borrowed from digital image processing is region growing. This strategy requires the initial definition of seed cel ls for each class, to spatially grow and expand specific classes towards neighboring cells in which similar conditions are found to exist. Assessing similarity is done by comparing cell features and when conditions are met neighbor points in the spatial domain are added to the appr opriate seed class. Zhang et. al. (2006), for example, use local height variability at the cel l location as the feature in the region growing scheme. Although used most often with DSMs, the technique can be applie d to raw ALSM point clouds as well. Because most airborne lidar data is acquired using a raster (1D) side-to-side scanner, analysis of data collected along individual scan lines is another approach used to segment and classify data. Large changes in the observed f eatures of a point with respect to neighboring points along a scan-line suggest a change of state in the data. By comparing feature values to predefined thresholds, assignment of the point s to classes can be car ried out. The profiles collected along the scan lines can be considered as narrow slices of the surface models. Figure 11 shows how a scan line or any one-dimensional lin e selection of ALSM data can be used for segmentation purposes. The fact that only the points in the scan line are used for the analyses based on queries such as height difference and slope co mputations, results in very fast processing times. On the other


19 hand, the one dimensional nature of scan line processing does not make use of descriptive information captured by points in neighboring scan lines. And there are some restrictions to certain directions used in the an alyses which result in features non-invariant with respect to most geometric transformations. Different variations of this idea are used by Axelsson (1999), Sithole and Vosselman (2006) and Shan and Sampath (2005). Finally, in the same way remote sensing im agery has been analyzed through supervised and unsupervised classification schemes, pattern recognition techniques ha ve also been applied to analyze gridded and raw ALSM data. Clustering, in the form of unsupervised classification, uses point or cell feature simila rity as the criteria for classifi cation. In an iterative process, feature values are computed for each datum to cl assify and similarity distances to the already established clusters/classes ar e computed. Cluster assignation is determined by the smallest similarity distance. An advantage offered by cluste ring is that spatial and spectral features with their own weights can be blende d together in the similarity measure computation. Elberink and Mass (2000), and Haala and Brenner (1999) use the technique over gridded data. Filin and Pfeifer (2006) incorporate clustering to the analysis of ALSM point clouds. Figure 1-1. Scan line over an urban landscape. The vertical profile is a subset of the scanned point cloud. By using local point features (computed at each point) such as slope variation and height varian ce, ground points (in red) can be separated from nonground points (in-blue).


20 Our approach for classifying ALSM data is expl ained in Chapter 3. In this chapter, a more detailed analysis of the previous works done for classifying ALSM data is given. 1.1.2 Modeling Strategies Several processing strategies have been deve loped for working with photographic, radar, and ALSM data, separately or in combination to obtain three dimensional (3D) models and footprints of buildings. The most accura te models currently are obtained through photogrammetric method, but they require co stly interaction with expert users. In general, the photogrammetric production line for 3D processing requires several steps to transform aerial photographs into DSMs, orthop hotos and vector maps. First, to measure elevation values, features must be able to be identified in at least tw o photographs; therefore, overlapped pairs of photographs (ste reo pairs) are needed. Then, to correct for any tilt the optical axis of the camera had with the vertical axis at capture time, and the relief displacement that any aerial photograph presents, vertical rectification and differential rec tification must be carried out. The rectifications are carried out using stereo pl otters or digital stati ons and are based on the collinearity and coplanarity condi tions a stereo pair obeys. For the rectificati on, internal and external orientations of the camera are computed by using gro und control points (GCPs), which tie surveyed points to points on the pict ures, in an aero-tr iangulation process. Besides rectification, geo-refere ncing of the data is require d. Elevations can be computed when a feature is found in both pictures (stere o matching), and by computing many elevations, a digital elevation model (DEM) can be direc tly digitized. Orthophoto production will require those DEMs, and vector maps can be digitized from orthophotos or captu red at DEM production time. Figure 1-2 shows a very simplified flow diagram with the general required steps for producing photogrammetric products.


21 Because of that complexity, many of the algorithms do not even need previous segmentation or classification; instead, they directly extract building edges using interactive user expertise. When a high degree of automation has been attempted under this paradigm, the accuracy typically suffers significantly, as in the works by Gruen (1998), Gruen and Wang (1998) and Zebedin, et. al. (2006). The main obst acle faced by those methods is the correct matching of features on both photographs composing the stereoscopic models. On the other hand, urban modeling thr ough ALSM data gives more automation possibilities because the three dimensional informa tion is inherent to the data and not derived from stereoscopic models. Once classified data is available, ALSM points or cells are used to fit 2D building footprints or 3D building models. Figure 1-2. Simplified photogrammetric production lin e. Vertical and diffe rential rectification requires the computation of internal and ex ternal orientations of the camera. DEMs are the initial main product sought when ster eo pairs are used. Vector Maps can be produced either at stereo matching pro cessing or by digitiza tion over orthophotos. DEM Vertical and Differential Rectification Stereo-Matching and DEM Generation Orthophoto Generation Vector Map Digitazion Set of Stereopairs and GCPs Rectified Pairs Vector Map Vector Map Orthophotos


22 When 3D roof shapes are modeled two main st rategies can be used: Fitting of extremely simple 2D shapes to the data such as planar f aces (data-driven), and fitt ing of more complex 3D models such as rectangles, cones, tetrahedr ons, spheres and other parametric shapes (modeldriven). ALSM raw data is well suited to accurate ly compute orientation of roof faces in space. An example of a data-driven method over ALSM data is the Hough transform-based method by Maas and Vosselman (1999). In a model-driven method, losing positional accuracy Alharty and Bethel (2002) and Haithcoat et al (2001) fit parametric m odels over gridded data after segmentation. Both of these methods can suffer greatly if the roof shapes ar e irregular, partially occluded by overhanging trees, or populated by roof-t op structures, such as air conditioners, chimneys, or dormers. Fitting planes does not directly enforce continuity of the plane intersections, so the modeled roof facets may not even fit together. Fitt ing of 3D volumes can mitigate this problem, but small root-top objects or occlusions can cause the fitted shape to be larger than the actual building, or misaligned. Although the methods stated above can infer roof shapes, the accuracy of the building edge location for footprint delineation is compromised due primarily to the fact that the exact location of laser pulse hits on building structures is unkno wn. A few examples of how footprint locations have been estimated include least squares adjustme nt of most exterior bu ilding points (Sampath and Shan, 2007; Zhang et. al., 2006), intermedia te location between bu ilding and ground points (Morgan and Habib, 2002), and adjustment restricti ng building points to be inside the footprint (Maas and Voselman, 1999). 1.2 Contributions and Proposal Outline The main goal of this work is to present a nd develop two new methods for classification of ALSM data and delineation of highly accurate building footprints. These methods have the potential to improve subseque nt building modeling, but a de tailed treatmen t of building


23 modeling will not be addressed. The methods propos ed use strategies and information extraction methods that have not previously been applied to ALSM data. Regarding the classification task, the met hod acquires building cl ass knowledge through supervised machine learning of poi nt features. Machine learning refe rs to the set of methods and algorithms that allows computers to acquire knowledge from data. Statistical methods, among other tools, transform training data into knowledge basis used most of the times for decision making. The ALSM point features used for clas sification consist of a 3D triplet in a georeferenced Cartesian coordinate frame ,, X YZ The X and Y values are typically in units of meters in the UTM Northing and Easting format, where as Z is typically in meters from the WGS-84 ellipsoid. Although final results are given in UTM format, conversion to local coordinate frames, where values range from 0 to a few hundred meters, is done for algorithm processing. Due to the fact that laser beam can penetrate vegetation canopies (photons pass through even the smallest opening in the canopies), this data format allows us to discriminate spatially overlapping classes in the case of overhanging vegetation. Classifi cation can be carried out over each one of the points in the cloud instead of over 2.5 dimensional gridded surfaces. Also, because of the way the class information is ob tained, no empirical thresholding of heights or slopes is required. This is important because th e reliance of such case-specific thresholds by many other methods can dramatically degrade the usefulness of algorithms when they must be applied to new locations. As to the delineation of building footprints, apart from the initial training in the classification stage, the met hod is completely automatic and uses mostly information of building edge locations extrac ted from the data. The main novelties and contributions of this work include:


24 Use of spin images for training and classificat ion of ALSM data. Becau se the classification is done over raw data, the point features chosen for the task must describe in some way the geometric location of the scanned object insi de the local and three-dimensional point neighborhood. Spin images are bi-dimensional im ages whose pixels describe the geometric relationships between points in the neighborhood. As such, images derived from similar points present similar pixel values in speci fic image locations. For this reason, image pixels work as point featur es in the scheme. Spin images, an idea generated in the Computer Vision field, can be s een as a tool that summarizes or condenses the information from many possible scan-line profiles into a simpler set of pixel values. Modification of the Hough transform algorith m for spin image generation. The class signature that is extracted from spin images works at its best if the image is generated in a local surface-normal neighborhood frame. To transf orm spatial coordinates to that frame, a plane fitting to the points in the neighborhood is carried out. Because least squares adjustment and RANSAC methods are highly influenced by outliers, a modified Hough transform algorithm is developed that forces the plane to contain the point on which the image is generated. In addition, this modi fication saves computational time over the traditional Hough transform, which must exha ustively search planar parameter space. Classification of ALSM data that overlaps in the horizontal space. Unlike methods that use 2.5-dimensional surfaces or pr ofiles, the classification is done over three dimensional ALSM points. Because of the multi-return da ta generated by ALSM systems, information of buildings portions below vegetation can be scanned. As the method works in a pointbasis, the points can be classified without being affected by points located immediately above or below. This is obtained due to the implementation of spin image pixels as point features, and the selection of ALSM raw data as input data. Use of a new method for computing mutual info rmation (MI) in a very high feature space and utilization of this method to optimize the spin image parameters. The right use of spin images for classification depends on the definiti on of the image parameters that defines the pixel size and its spatial coverage. To choose the optimal parameter set, MI is used to calculate how good a training set is at classifying data. MI is computed by using class information through estimation of probability de nsity functions, and computation of these functions for class information defined by many features is not easy. A method for computing that value is defined and used for that task. Automatic and accurate delimitation of foot prints. The delimitation and definition of vector building footprints reaches high accuracy when photogrammetry supports the processing. Methods using ALSM data work well at speeding up and automating the process, improving that way the weaknesses in those aspects that come along with photogrammetric methods; however, no ALSM me thod to date has offered comparable accuracy in the detected building edges. A method that works over ALSM raw data and delimitate footprints by using pulse returns from building edges is proposed To obtain better positional accuracy of the footprints. Novel use of ALSM data for building modeli ng. In addition to three-dimensional spatial coordinates of ALSM data points, the return number, an attribute that each point has and


25 that gives information about which portion of th e laser pulse hits the target, is used for classification of raw data. That is possible due to the clues gi ven by the interaction of laser pulses with the targets they hit. As part of the strategy to obtain accurate building footprints, configuration of pul se repetition rate and beam di vergence angles in the ALSM system, along with its multi-return capacity ar e used to identify ALSM points on building boundaries. A more detailed description on how these cont ributions are used and the results of the proposed methods are presented in this pr oposal. Chapter 2 offers a background on ALSM systems and the theory needed for better understanding of the classification and modeling methods. In Chapter 3, the method for classifying AL SM raw data is presented. For testing the method, observations over urban an d suburban areas were collected, and classification results are compared to ground truth data. In Chapter 4, an application of the classification method shows how the scheme works for higher resolution data, in this case, point clouds obtained by a ground laser scanner. Classification improvements ar e shown when ALSM data and ground data are fused. Finally, Chapter 5 describes the method for delineating footprints fro m the classified data, and shows how the classification enables the f ootprints of buildings nearby or partially overlapped by vegetation to be traced.


26 CHAPTER 2 AIRBORNE LASER SWATH MAPPING AND THEORETICAL BACKGROUND A background on how the system to capture AL SM data is given along with technical specifications of the systems used at the Universi ty of Florida. Also, a short review of basic theory and methods used in the classifi cation and modeling methods are described. 2.1 Airborne Laser Swath Mapping Background Airborne Laser Swath Mapping (ALSM) is a re mote sensing technology that uses the timeof-flight of laser pulses to determine distances fr om an aircraft to reflective surfaces on or near the surface of the earth. When co mbined with the position and or ientation of the aircraft, as determined by GPS and inertial measurement unit observations, the three dimensional spatial coordinates of the surface point s can be deduced. ALSM is an active remote sensing method, and is also commonly referred to as lidar (light detection and rang ing), ladar (laser detection and range), or airborne laser s canning. (Shrestha et. al., 1999). By using a laser operating at thousands of pulses per second, three dimensional spatial coordinates at meter or sub-meter spacing are obtained. The unequally spaced raw ALSM points can be interpolated to produce gridded digital el evation models (DEMs). Because of the raw data stream includes information from the several in struments (GPS, IMU, sca nner, range electronics) on the aircraft, and additional data from ground ba sed GPS receivers, the data are assembled and generally after the comp letion of the flight. More specifically, the ALSM system is made up of three principle subsystems: The laser scanner, the inertial measurement unit (IMU) an d the global positioning systems (GPS) receiver. The scanner is the most important sub-sy stem, and it basically emits laser pulses and senses back energy reflections of the pulses hitting objects on the surf ace below. Usually, the pulses are directed toward the Earth by an osci llating mirror that moves perpendicular to the


27 flight direction, producing a saw-tooth scanni ng pattern, but rotating polygons and nutating mirrors (Palmer scan) are also used in other scanning schemes (Wehr and Lohr, 1999) resulting in parallel lines and elliptical patterns respectiv ely. The rate at which the pulses are emitted along with the angle of the mirror, the flight height a nd aircraft velocity defi ne the sampling resolution of the scanning process and the diameter of the spot footprint on the surf ace. Also, the maximum scanning angle defines the width (swath) of the data strip. Regarding the reflected pulses, due to the in teraction of the pulse energy with objects on the surface, and the nature of the objects themselves, more than one group of reflected photons can bounce back and more than on e location (return) can be sens ed for the same laser pulse (assuming the system has that capacity). In addition, the relative strengths of the reflected signals can be stored as intensity values. The other two sub-systems, a GPS receiver and an IMU measure aircraft position and orientation. At least one ground GPS base station is used in conjunction with the on board GPS data to eliminate common mode errors in the kinematic solution. The IMU measures changes in the ALSM sensor’s attitude (roll, pitch, and ya w), and accelerations in X, Y and Z, several hundred times a second. The resulting attitude an d accelerations are finally blended with GPS positions, range and scan angle data to obtain the X,Y, Z coordinates of the surface points. Accuracies range from 15 to 20 cm and vertical (Z) accuracies range from 5 to 10 cm, depending on flight conditions and ALSM sy stem configuration. An illustra tion of the system and scanning process components is shown in Figure 2-1. As mentioned above, the repetition rate of th e laser pulses, among other causes, affects the surface point density of the scanned data. Also, th e physical characteristics of the laser pulse and its interaction with the ground targets influence the conditions to obtain returns from the scanned


28 area. Below, a more detailed description of the sy stem parameters and how they affect the point cloud attributes is given. Figure 2-1. ALSM system com ponents and scanning process.


29 2.1.1 Parameters Affecting the Scanning Pattern Before explaining the details of the pattern behavior, let us to de fine the parameters: Maximum Scan Angle (MSA): Maximum angle meas ured left and right of the sensor nadir to which laser pulses can be distributed. Scan: One full left to right and right to left cycle of the scanner mirror. Scan Frequency (SF): Number of full scans per second. Laser Repetition Rate (LRR): Number of pulses per second emitted by the system. In addition to these parameters, the attitude, al titude and velocity of the airplane on which the system is mounted affect the pattern of laser points on the re flecting surface. These parameters vary continuously during the data coll ection and result in signif icant departure of the point pattern and points per square me ter from the nominal planning values. The saw-tooth pattern seen from a top view of the laser pulses hitting ground is a consequence of the across the flight movement of the oscillating mirror and the along the flight movement of the sensor head. The mirror helps to project the pulses towards the ground, and it moves at an angular velocity specified by the ma ximum scan angle, and the scan frequency. This relation is given by: ) ( 4 1 2 1 2 SF MSA SF MSA Time Angle AV (2-1) where AV is the angular velocity, and angle and time are the angle span and the time spent to cover half scan. The angle measured between the paths of two consecutive pulses is a function of the angular velocity and the laser repetition rate. By knowi ng the time span between two consecutive pulses we can meas ure how much angle the mirror rotates during that time: LRR SF MSA LRR AV ACP ) ( 4 (2-2)


30 where ACP is the angle the mirror rotates be tween consecutive pulse emissions. Figure 2-2 shows what some of these parameters and values are. The ACP directly defines the space between spots on the scan line, and as the angle increases the spot spacing does also. That is to say, that as the scan frequency and the maximum scan angle increases, the spacing also increases. The same effect is seen wh en the airplane height flight or ground speed increases. When that ha ppens with the remaini ng parameters being the same, the swath width increases or the along track coverage per unit increases, thereby increasing the spot spacing. Figure 2-2. Snapshot of an ALSM system at s canning. Figure not to scale. ACP stands for the angle between the trajectori es of consecutive pulses. Altitude Half Swath Width Maximum Scan Angle ACP Angular Velocity Consecutive Pulses


31 The width of any saw teeth in the pattern, wh ich also defines the al ong-the-flight distance between points, depends on how fast the airplane moves and the angular velocity of the mirror. At higher plane velocities the width increases with respect to lower velocities because the distances traveled along the plan e flight increases. On the other hand, as the mirror angular velocity increases, more scans are covered within the same time and therefore the teeth width and point distances along the flight decreases. It is worth pointi ng out that while scan frequency increases produce higher angular velocities and s horter along-flight distan ces, they at the same time make spot spacing on the scan line larg er, as noted in the previous paragraph. 2.1.2 Laser Pulse Characterization and With-Target Interaction We can think of the emitted laser pulses as millions of photons at a specific wavelength containing some amount of energy. The pulse inte racts with reflecting su rfaces and a fraction of its energy is returned to the receiver sensor. Th e number of photons in a pulse and the energy the pulse contains are related by: N hc E (2-3) where E is the energy measured in joules, h is the Planck constant, N is the number of photons, c is the speed of the light, and is the wavelength. The laser pulses are created by a process known as Q-switching, a nd can range in length form hundred s of pico-seconds to several nano-seconds. (Saleh and Teich, 1991). Multiplying the velocity of light by the pulse length measured in units of time yields the pulse length in spatial units. The laser beam is up-collimated to reduce the divergence, and the approximate laser footprint at the target is given by (Baltsavias, 1999): 2cos h fD (2-4)


32 where fDis the footprint diameter, h is the flight altitude, is the beam divergence, and is the instantaneous scan angle. When the system is pointed toward nadir the footprint diameter is given by h fD Because of the small divergence and th e big difference between the pulse and beam lengths, at ground, the laser pulse can be cons idered to be an energy cylinder with a length depending on the switching time and a footprint diameter (see Figure 2-3). The distribution of the pulse energy in the lase r pulse can be analyzed in two ways: across and along the pulse length. Optical mechanisms in laser systems allows the photons be spread almost uniformly across the face of the beam, whic h means that the amount of energy interacting Figure 2-3. Pulse Characteristics. l is the pulse length, fDis the footprint diameter, and the scan angle. Altitude f D l


33 with the target could be considered proportiona l to the footprint area portion intersecting the target surfaces. Along the pulse, th e energy is more concentrated in the middle of it than in its extremes, and its power analyzed within th e length time resembles a Gaussian shape. Only a small percentage of the energy tran smitted in each laser pul se completes the round trip from the transmitter to the reflective surf ace and back. Photons are absorbed and scattered by air molecules and pollutants in th e atmosphere, causing a loss of signal proportional to the path length through the atmosphere. The interaction with reflecting surfaces depends on the reflectivity of the surface at the wavelength of the laser light, the surface texture, and the angle of incidence. Finally, characteristics of the receiver, including the optics, spatial and chromatic filter, and the quantum efficiency of the photo-de tector account for the final number of returning photons detected. Due to all this conditions, a mini mum percentage of the pulse energy must hit a reflecting surface to detect it. 2.2 ALSM System Description The systems used for capturing data for this work were provided by the Geosensing Engineering and Mapping (GEM) Research Center at the University of Florida. The Center owns and operates its own two ALSM units. The first AL SM unit is an Optech, Inc. 1233 system. It is mounted in a Cessna 337 Skymaster aircraft and typically flies at alti tudes of 400 m to 1,000 m above ground level. The laser em its pulses of near-infrared ( =1064 nm) light that are 10 to 12 nanoseconds long (full width half maximum, FW HM), at a rate of 33,000 Hz. It has an oscillating mirror scanner with selectable sca nning rate and amplitude, typically operated at 28 Hz and 20. When flying at 600 m above ground level, the individual laser footprints are approximately 15 cm in diameter, and are dist ributed over a swath a pproximately 400 m wide yielding a spot density of roughly 1 spot per s quare meter on the ground. The system captures the first and last return for each emitted pulse, and there is a minimum range distance between these


34 two pulses to be registered as different energy responses. This limitation is related to the pulse length and the fact that the system electronic circ uitry has physical limitation to sense and store a response before it is able to record the second one, resulti ng in a dead time of several nanoseconds between stops. The minimum dist ance between reflecting surfaces must be approximately 3 meters for both surfaces to be detected in a single pulse. The second ALSM unit owned by UF is a second generation system, called a Gemini system, and was also manufactured by Optech Inc. The Gemini can be operated at a different pulse repetition rates ranging from 33 kHz to 167 kHz including, but perform s best at pulse rates of 70, to 125 kHz. The higher rate s result in 3 to 5 poi nts per square meter at the flying heights and scan rates typically used. Four returns can be recorded for each shot, including the first and last ones. Two beam diverges can be selected: a narrow beam divergence (0.3 mrad), and a wide beam divergence (0.8 mrad), making the system more flexible with respect to the different characteristics of the surface to be scanned. The Gemini has an upgraded scanner, which provides higher frequency and amplitude s canning, and has an improved IMU to better determine the orientation of the aircraft. Figure 2-4 shows photographs of the two ALSM systems. Figure 2-4. ALSM systems at UF. A) Gemi ni System. B) 1233 Sy stem. C) 1233 System mounted in the Cessna 377.


35 2.3 Theoretical Background The data analyses methods proposed here ar e based on approaches and methodologies used in a variety of related fields, including some not common to remo te sensing. A short description of how pattern recognition scheme s and feature descriptors are re lated to machine vision systems is given, along with details on Bayesian cla ssification. Also, a shor t discussion of neighborhood local frames for computing object and point features is presented. 2.3.1 Pattern Recognition and Computer Vision The recognition of specific objects within a mixed set of objects using photographic images or laser scanning observations is a critical step in the functioning of machine or computer vision (CV) systems. CV systems include all th e required hardware and software to sense the scene, preprocess, segment, and describe the data, and recognize specific objects (Bennamoun and Mamic, 2002). CV systems typically work with single or set of pictures or with laser range images that directly measure three dimensional data. In general, recognition means to search for a very specific object among a model database. When the recognition implies labelin g an object with one of the predefined or “to be defined” classes (not necessarily already stored in databases), the term used is classification. Whichever the case is, many approaches fo r recognizing or clas sifying are available. The basic idea on which most of them rely is comparing, by using a similarity measure, object features with the features of the model stored objects to find out which model is more similar to the analyzed scene object. Bayesian classification, support ve ctor machines, neural networks, template matching and clustering are among the approaches used. The recognition phase of CV systems heavily de pends on the way the data is represented. Typically a set of models to be recognized ar e stored using some data description. Objects extracted from the scene imagery are extracted and compared to the model database. When a


36 sensed object description matches a description of a stored model, the recognition is reached. For spectral images, descriptions of the objects are gi ven by the features (e.g. shape, size, texture, etc) computed from an image segment (an image subset made up of pixels aggregated according to some segmentation criterion). For range images the three dimensional data is summarized in computationally more feasible representations that de scribe partial or complete objec ts. Many ideas have been proposed to represent 3D objects a nd their parts. Similar to the spectral case, object features represent the objects (or object parts), and they can be extr acted at some specific object portion or summarize more global characteristics of th e object. The features are extracted from curves and surfaces that can be fitted to the object or from the point cloud. Among such representations, spin images (Johnson, 1997) were initially proposed for su rface matching and then found suitable for object recognition. Spin images are f eature descriptors that summari ze three-dimensional information from local neighborhoods into two-dimensional images. 2.3.2 Bayesian Classification Bayesian decision theory is one of the main t ools used for statistical pattern classification. This theory is based on the Bayes formula: ) ( ) ( ) / ( ) | ( x p P x p x Pj j j (2-5) where P( j/x) is the probability that an object havi ng a feature measurement x belongs to the class j. If posterior probability for j classes can be computed, the object having feature measurement x can be assigned to the cla ss j which produces the largest value for P( j). To be able to compute the prior probability P( j) for each class, only likelihood p(x /j) and the evidence p(x ) are needed. Training data is used to estimate these probabilities.


37 Training is the stage in the supervised lear ning approach that allows us to acquire knowledge of the classes to compute posterior pr obabilities and hence cl assify the data. In training, parametric or non-parametric estimation of the probability density functions (pdf’s) of object features is done by using feature values of objects whose class membership is known at estimation time. There exist different methods for computi ng pdf’s. Maximum-likelihood and Bayesian parameter estimation are methods used when we ar e certain about the prob abilistic functions but the parameters that completely define the f unction are unknown (e.g. we know that the function is Gaussian but the mean and variance is unknown ). Non-parametric methods are more flexible in the sense that no assumption is made about wh at functions define the probabilities, therefore any function can be estimated without com puting any function parameter. Although it is extremely convenient to use parame tric functions for Bayesian clas sification, at learning stages few class features behave as a single-modal or multi-modal parametric functions; therefore the non-parametric option is used in those cases Parzen Windowing and k-nearest neighbor estimation are examples of non-parametric methods. Pdf definition by Parzen windowing. For distribution computation using Parzen Windowing, a grid of the feature space is ca rried out by assigning each of the d feature dimensions discrete values. After defining a gr id at some resolution, we can compute the distribution value of each “d” dimensiona l feature point (x) of the grid by: n i n i nh x x V n x p1) ( 1 1 ) ( (2-6) where n is the number of d-dimensional feature points, d n nh V is the volume of the Parzen window, xi is each one of the training points of some specific class and is the Gaussian function:


38 2 /22 1 ) (ue u (2-7) The size of the parzen window can be computed by using the following equation: n h hn/1 (2-8) where n is the amount of traini ng points for each class and h1 is an initial wi ndow size. For large values of d, the distribution computation become s time consuming and other approaches have to be sought. Pdf definition by k-nearest neighbor. When k-nearest estimation is used, the posterior probability computed for each point is given by k k x p P x p x Pj j j j ) ( ) ( ) | ( ) | ( (2-9) where P( j/x) is the probability of the object havi ng a feature value x be longs to the class j, k is the number of neighbors used for estimation and kj is the number of neighbors in class j. A key aspect on the method is the de finition of the neighborhood distan ce to find out which objects from the training data are the ne ighbors at feature value x. The actual computation of the prior probability P( j) for each class, likelihood p(x /j) and evidence p(x ) are by-passed, and the estimation turns out to be a nearest neighbor se arch in the feature sp ace. Once the k-nearest neighbors are searched for each point, it is assi gned to the most frequent class among the k neighbors. 2.3.3 Clustering Unsupervised learning is another way to cla ssify data and it is ba sed on object feature analysis. Clustering, as this ki nd of learning is also called, la bels data objects without using previous data training, therefor e it is an easiest way to classify from the user point view.


39 However, because its outcome depends on clusteri ng algorithms and criteria, different results may be obtained for a single dataset de pending on which strategy is followed. The primary clustering purpose is to group data objects in classes which are made up of objects which are very similar to the remaining objects within the class. The similarity is evaluated using proximity measures and some cr iterion to analyze the object features. A more formal definition of clustering is given by Theodoridis and Koutroumbas (2003). They mathematically define an m-cluster of a dataset X as a partition of X into m sets such as: m i Ci,..., 1 0 X Ci m i 1 m j i j i C Cj i,..., 1 0 The proximity measures are defined according to the data object features. When analyzing real-valued features for example, the Euclidean di stance can be used as a measure. The clustering criterion determines how to assi gn an object to a cluster by using the measure. In general, though depending on the clustering method, the measure is computed between data objects and current clusters so similarity between the current clus ters and the analyzed object can be assessed. Because of that comparison, clusters must have a representative point that comprises the cluster description in object features that can be co mpared to each data point. For circular-shaped clusters in a 2 dimensional feat ure space, cluster can be represen ted by points. On the other hand, linear-shaped cluster representatives, in th e same space, can be represented by lines. There are several clustering methods. The most fall within the sequential, agglomerative and divisive categories. In the sequential algorith ms, each member of the data set is compared to the current clusters and assigne d to the most similar one. Cluster definition may depend on the order the object are analyzed.


40 Agglomerative and divisive algorithms are sim ilar in the sense that the whole data set defines natural clusters without any need of processing at some stage of the analysis. Agglomerative methods start considering any data object as a cluster and th e clusters are formed as partitions are formed by grouping similar obj ects. Divisive algorithms consider the whole dataset as the only cluster at th e beginning of the analysis. Then the least similar object are the basis for other new clusters. Although some algorithms use the fact that the number of cl usters are known beforehand, for most actual clustering analysis the number of clusters is a result of the process. 2.3.4 Neighborhood Systems One of the most important steps of any clas sification process based on statistical pattern recognition is the feature selec tion. However, when ALSM data is processed, the way these features are computed becomes a d ecisive factor in the quality of the feature values. ALSM data consists of thousands and sometimes millions of three dimensional points representing topographic locations with intensit y values associated to each poi nt. Because of the method used for their capture, the distribution of points within the point clo ud is irregular (the distances between points are not uniform). To classify points, feature computation base d solely on each individual point offers the intensity value as the only possible additiona l feature for classification; therefore the computation of features based on the neighborhood of each point has been used on ALSM data. Gridding is a method that transfor ms Lidar data in a regular 2 di mensional array. For each cell of the array, features are computed by using the Li dar points inside each cell. The same parameter values are assigned to each one of the points insi de the cell because all th e points on the cell have the same neighbors (Luzum and Slatton, 2005). Although a matrix array offers computational


41 advantages when processing, the individual clas sification of each point with respect to its neighbors is not possible. One way to overcome the restrictions describe d above is to use a different neighborhood for each point. Several approaches have been pro posed and most of them are based on how the projections of the points in the X-Y plane are spatially related. Triangular irregular networks and circular neighborhoods centered at each point are examples of these methods. Basically these methods do not consider the three dimensional aspect of Lidar data for neighborhood computation. To take advantage of the three dimensional structure of Lidar da ta, neighborhood systems that consider heights are required. Filin and Pfeifer (2005) propose new approaches for defining neighborhoods. One of them is a variation of the circular (cylindri cal in 3D) neighborhood. Instead of having an infinite height cylinder centered at each point to define neighborhoods, a “bounded vertical cylinder” restri cts the neighborhood to points with in a set range of heights. A slight variation of this appro ach is to use an “inclined bounded cylinder” following the slope of the plane tangent to the surface of the cylinde r. These neighborhoods can be used for grouping spatially nearby points and com pute features from those point s. Figure 2-5 shows how the neighborhood for a tree point includes or exclud es roof points, according to the system neighborhood used. The formal description of the neighborhood is the following (Filin and Pfeifer, 2005): Having L the set of laser points, Nc(P) the cylindrical neighborhood of radius r and height 2h, P=(px,py,pz) the point where the cylinder is ce ntered, the neighborhood is defined as: h q p r q p q p L Q P Nz z y y x x c 2 2) ( ) ( / ) ( (2-6) Based on this neighborhood, a slope adaptiv e neighborhood, Na(P) can be defined as:


42 d n Q P P N Q P Nc A ). ( / ) ( ) ( (2-7) where n is the normal vector to th e tangent plane to the point P. 2.3.5 Spin Images The primary descriptor used in this study fo r classification purposes is the spin image. These images, when applied to individual points, compute feature point values related to the geometric relationships between the points in th e local neighborhood of the analyzed point. Spin images depend on the definition of local reference fr ames whose orientations are tied to the local surface. In the next chapter, a more precise defi nition of a spin image, and how they are applied for classification are given. Figure 2-5. Neighborhood system s for ALSM points. A) Nei ghborhood (red points) of a tree point (green point) defined by the points inside an infinite-height cylinder. B) Neighborhood definition for the same poi nts using a finite-height cylinder.


43 CHAPTER 3 CLASSIFICATION OF BUILDING INFRASTRUCTURE IN AIRBORNE LIDAR POINTS USING SPIN IMAGES As Airborne Laser Swath Mapping (ALSM) technology has become available to the remote sensing research community, many fields of application have been proposed. For many of those applications, necessary processing steps include filtering of the data. Filtering can be seen as a mid-level segmenting/classifying step wh ere the segmented/classified data is used for higher-level recognition and extraction tasks. Segmentation and classification of ALSM da ta poses a challenging problem because the terrain and land cover are discretely sampled by th e laser pulses. Gaps be tween laser footprints on the ground are usually 0.5 to >1 .0 meter apart. Thus, returns from man-made features, such as building roofs can exhibit statistical variab ility on the same order as vegetation and yield irregular edges when interpolated into elev ation images. The proxim ity of buildings and vegetation in urban and suburban areas often pr ecludes clear definition class boundaries. Adding to the complexity, classes often overlap in the horizontal dimensi ons (consider a tree overhanging part of a house) when viewed from above. The separation of urban infrastructure re presented by buildings, houses and other manmade objects from trees, in a manner that fully explo its the 3D point data, is the main goal of this chapter. Knowing that classifi cation results depend on the defin ition of representative data features and a method that effectively processe s the data, the use of a statistical pattern recognition method combined with a carefully chosen data feature for ALSM data will allow us to classify the data into build ing and non-building cla sses. The work presented here focuses on segmentation of building and tree points in ALSM point cloud data using a local subspace mapping known as spin image. Some of the methods used to date are descri bed in Section 3.1. In Section 3.2, how the data is acquired and prepro cessed is described. The segmentation approach


44 is presented in section 3.3. S ection 3.4 presents a sensitivity analysis of the classification accuracy to the spin image parameters. In secti on 3.5, a complexity analysis of the general algorithm is presented. Qualitative and quantit ative results are shown in Section 3.6, and conclusions are given in section 3.7. 3.1 Background Classification of ALSM point da ta is a process that usually is done as a preprocessing step for posterior object modeling over the classified data. As such, this cl assification is a critical step in the modeling process. Sometimes, segmentatio n is done before a complete classification. Several approaches for segmenting and classifyin g ALSM data have been proposed that exploit specific ALSM data characteristics such as multiple pulse returns and associated intensity information to each pulse in addition to the implicit geometric information. Among them, region growing, clustering, vertical profile based segmentation and slope-based segmentation are the most common. One of the first attempts to separate building from tree information is presented by Haala and Brenner (1999). In this work, using multis pectral imagery, DSM and DTM are obtained from the Lidar data and combined with spectral information to segment the data. Axelsson (1999), on the other hand, uses onl y Lidar data and analyzes s can lines to extract building surfaces. The idea behind his approach is that scan lines on buildings have less height variability than those on vegetation, and first and second deri vatives of the height di fferences computed at point locations are used for segm entation. Therefore, some authors have used first and second differences in height betw een Lidar points in a slidi ng window for segmentation. The idea of local height vari ability was also used by Morg an and Tempfli (2000) in a morphological filter that analyzes this property of point cloud s inside windows of several sizes. A deeper study on how not only height, but also in tensity variability can be used for the building-


45 tree separation problem is presented by Luzum and Slatton (2005). Among a set of features, standard deviations of height a nd intensity values from first and last pulse returns are found to be good features for discrimination. Using a DSM derived from Li dar data, Haithcoat and Song (2001) segment building and non-bu ilding points by building footprin t detection based in height and gradient information at each point. Gr adients are computed by overlapping convolution masks, Gaussian kernels in this case, with the images what ma kes points on building footprints to be detected. In the work of Alharty and Bhetel (2002), the multi-return values and local variability are also used for filtering purposes. Using only gridded Lidar data, the authors show an algorithm to detect and reconstruct 3D features such as buildi ngs. Statistical inference about fitting surfaces in small windows over data points are the base of this work. First and last puls e returns are used for extracting hard from soft surfaces. From the se gmented point cloud, tree s and other objects are filtered out by looking at the local height variability. This is done by fitting a plane to points within a small window to comput e the residuals to the plane. Points with a low RSME (root square mean error) should belong to building class. As an example of a region growing segmentatio n, Zhang et. al. (2006) present a framework containing a set of algorithms to obtain buildi ng footprints. The three main tasks in the framework are: non-ground point s filtering, building detec tion from non-ground points and building footprint delineation. A region growing algorithm is us ed for building segmentation. The region growing algorithm is based on fitting a plane to points surrounding the point to be classified. Such direct fitting of planes to points can have difficulty discriminating among complex rooftop structures or architectures.


46 Filin and Pfeifer (2006), on the other hand, propose a slope-based method to compute features combined with a clustering approach to segment ALSM data. The segmentation is expected to yield segments or surfaces where th e points included on them have similar features. The segmentation is intended to be a prior step to the final clas sification process. The features are computed for each Lidar point by usi ng the neighborhood points around each point. The neighborhood is adaptive in the sense that the poi nts are chosen according to its inclusion in a slope-oriented cylinder. While clustering approach es are generally less sensitive to irregular rooftop structures than plane-fitting methods, the clustering of disparate features can lead to ambiguous classification decisions. Sithole and Vosselman (2006) pr opose an algorithm to detect bridges from previously segmented ALSM data. The main step of the proce ss is segmentation. In this step, vertical line profiles are used to partition the point cloud, similar idea to the one used by Axelsson (1999). The profiles are done in several directions (vertical, horizontal tilted) resulting in as many partitions as directions are considered. Then, fo r each partition, each one of the profiles is segmented in discontinuous curves. Shan and Sa mpath (2005) also use a similar approach for segmenting data, but they use the scan lines defined in the data collection process. Clustering methods also have been used as the primary method for segmenting ALSM data. Elberink and Maas (2000) show a me thod to segment Lidar point clouds using unsupervised classification and anisotropic he ight texture measures. The algorithm requires gridded data and previous normalization of a digital surface model (DSM ). Definition of cooccurrence matrices based on height texture is required. By combining measurements of these matrices along with first and last pulse measurements in a k-means clustering process, classification of points in trees, buildings and other structures could be done.


47 3D modeling has become an important appl ication in urban planning, environmental monitoring and civil engineer ing (Shiode, 2000). As stated before, segmentation and classification of ALSM data are necessary a nd relevant steps for modeling. In modeling, segmentation and recognition using laser point clouds, automation is the main point of research. Automation is potentially problematic for many of the reviewed approaches. The requirement for case-specific thresholds increas es the dependency of th e processing on external information provided by the user. In addition, approaches that gr id the data impose information loss and do not exploit the richne ss of three-dimensional point da ta. Trying to cope with these weaknesses, we propose a statistics-based method able to pr ocess each point from ALSM clouds. 3.2 Data Acquisition and Pre-Processing To test the classification me thod, ALSM observations coveri ng a suburban area containing trees and buildings infrastructure was required, an d using a near-by area was desired because that allowed the convenient collect gr ound truth measurements. The da ta were collected over the University of Florida Campus and neighboring urban in December 2006, using the Optech 1233 system described in Chapter 2. The initial re duction and processing of the ALSM observations followed standard procedures developed by UF researchers. The University of Florida Campus is loca ted in an area of rela tively low topographic relief, but contains build ings with a wide variety of height s, forms, building materials, and rooftop geometries. There are also many differ ent tree types next to the buildings, making traditional segmentation based on heights statistics, reflectance statistics, or pre-defined shape functions, problematic. Over one swath was collected over the campus in 2006 flight. This data was augmented by observations collected in 2004, covering urban neighborhoods n ear the Campus. All of the swaths included in this study are nominally 400 meters wide, 3 km. long, and the spacing


48 Figure 3-1. ALSM Scan Line boundar ies of the data taken in Gaines ville and at the University of Florida Campus. The rightmost scan lines correspond to Cam pus data. The two leftmost scan lines correspond to off-cam pus nearby neighborhood. The x and y axes are measured in UTM (zone 17) meters. between footprints is about 1 meter. Figure 3-1 shows a photograph of the scanned area along with scan lines of the taken data. Separation of building points from tree points requires a previous filtering of the data. Filtering ground points out of the original data must first be done. Several approaches for classifying point as “ground” or “non-ground” points have been proposed. A good review of some algorithms for extracting ground points base d on data interpolation, slope computations, iterative surface estimation and active contours can be seen in the work of Sithole and Vosselman (2004). Other filters are based on st atistics and information theory (Kampa and Slatton, 2004). Because of the characteristics of the area to be processed, which is a region presenting smooth terrain, the classification was carried out through a combination of a morphological filter


49 and the use of a commercial so ftware package known as MARS This method allows us to process the data in a fast and adequate manner. The scan lines are first processed by a filter that iteratively looks for lower point s in spatial data windows and build a ground surface from those points. The process iterates going from larger window sizes to smaller ones. After processing, ground point candidates are saved for subsequent analysis. The candidate points were then processed using the MARS filtering tools (Mer rick and Company, 2007), where the candidate ground points are assigned a final classification based on selected conditions, including spatial closeness and relief change. Figure 3-2 shows preprocessed ALSM data. 3.3 Classification Approach Looking at the core of the methods reviewed in the introduction section for classifying ALSM data in building and tree classes, a common trend is observed. So me of them which are based on morphological operations re ly on user knowledge to define decision thresholds. Others Figure 3-2. Perspective view of a surface built by using preprocessed ALSM data. The ALSM data is the leftmost selected measurements from the off campus scanned data shown at Fig. 1. For illustration purposes, the segmented data has been transformed into a DEM and the grid cells colored according to the class most point in the cell belongs. Darker areas represent non-ground point surfaces. Lighter ar eas represent ground point surfaces.


50 require the whole set of data to be available be fore classification starts to compute triangulated surfaces or to grid the data. On the other hand, statistics-based methods that acquire pattern characteristics for posterior recognition have shown to be feasible when dealing with classification of urban data (C ord and Declercq, 2001). Based on the results reported with these methods, it appears that a number of the iden tified weaknesses might be overcome by using a method based on machine learning. The proposed classification approach is ba sed on a supervised learning scheme that acquires knowledge from training da ta statistics. When a statistical supervised learning approach is used, the selection of feat ures and estimation of the clas s-dependent probability density functions (pdf), i.e. likelihoods, are key aspects for the algorithm performance. In our case, spin images (Johnson, 1997) are used fo r object feature extraction. Spin images are well suited for representing 3D geometry in a 2D geometric su bspace; however, they require a high-dimensional feature space. The k –nearest neighbor rule is therefore used to provide a computationally efficient decision scheme based on the class like lihoods. Also, because at classification time the feature computation for each point needs da ta from a small local neighborhood, global knowledge of the entire point cloud is not needed. No gridding over the data is required because the decision rules are applied to each point. That characteristic is particularly beneficial in doing near real time classification. Be low, a more detailed discussion about the feature space and the spin image computation is given. Also, details of the training and classification steps in the scheme are presented. 3.3.1 Point Feature Computation through Spin Images Features are the characteristi cs of the objects that provi de meaningful values for classification purposes. According to the way the da ta is processed, features are computed for


51 ALSM data points when a point cl oud is processed, or for matrix cells averaging point values when gridded data from lid ar point clouds is used. Most of the first attempts to classify ALSM data relied on transforming the data into gridded top-view images of the scanned area. The use of two dimensional images is very convenient because of the many image processi ng tools available for analyzing the data. However, depending on the grid si ze, data can be lost when av eraging inside the cell is done. Also, the use of third dimensional information is restricted to what can be mapped to two dimensional images. Using of point cloud data instead of gridded da ta and the addition of point heights to the analysis provides new points of view for cla ssification tasks. When lid ar point cloud data are viewed from directly overhead, it is difficult to identify many objects. Howeve r, if vertical cross sections are viewed, it is often possible to identify objects based on their geometry. These ideas are used Sithole and Vosselman (2006) and Sh an and Sampath (2005) to perform initial segmentation of ALSM observations in to ground/non-ground classifications. When gridded data is used, neighborhood ope rations are performed by using surrounding cells to the cell in which the f eatures are computed. Having a poi nt cloud allows us to select those neighbors in a variety of ways; therefore volumetric boundari es can be chosen instead of two dimensional windows. Neighbor hood systems that looks for the points inside of cylindrical shapes centered at a lidar point and have in ac count the local surface were proposed by Filin and Pfeifer (2005). According to this, a feature descriptor that accounts for vertical profiles and computes neighborhoods according to local surfaces could repr esent an alternative way to describe lidar points. Spin images, a shape descriptor used in the Computer Vision community (Johnson, 1997)


52 that collapses local, rather than global, 3D information into two dimensions, is a feature descriptor that uses information of many profiles oriented with respect to the normal of the local surface around the region covered by the image. These characteristics make spin images the best option to compute point features in our classification scheme. The feature space dimension is defined by the number of pixels (features) a spin image has. To compute a spin image for a neighborhood of points centered at th e current point under consideration, or considered point (CP), in the 3D point cloud, the CP is referenced in terms of a local 3D coordinate frame. The considered point defines the origin of this local frame. The local' 'y xplane is defined to be the plane that passe s through the considered point and is tangent to the set of neighboring points. Thus, the spin image is a mapping of 3D coordinates of the neighborhood to a 2D image whose pi xel values represent the number of points at a certain radial distance on the' 'y xplane and at a certain he ight on the local normal (z’–axis). More formally, a spin image is a point signature that uses a mapping function to extract a value pair) ( from three-dimensional coordinates) (z y xfor each point. The mapping is defined by Johnson (1997): )) .( )) .( ( ( ) ( ) ( ) ( ) ( :2 2 2 3p x n p x n p x x f z y x f R R f (3-1) where) (n pare the main point and normal defining the basis of the local coordinate system, is the radial distance in the local 'y xplane computed from the' xand'ycoordinates and is the'zcoordinate in this new frame. The spin image is built by computing a histogram where each cell, located at a specific row and column position, stores the c ounting value of points located at specific and value ranges.


53 The size of the cell (bin size), representing and ranges, is defined according to the resolution of the point cloud data. This size along with the local neighborhood dimensions define the number of columns and rows a image has; thus, if a cylinder centered at CP and slope oriented with respect to the local 'y x plane is used, the number of columnsN is given bybS R N/and the number of rowsN is given bybS H N/ where R and H are respectively the radius and he ight of the cylinder, and bSis the bin size. As values can take negative or positive values, in contrast to values that always take positive values, cell positions in the mi ddle row of the image will have a value of zero. Depending on the bin size of the image, more than one point can be assigned to the same pixel, so an intensity value for the pixel can be com puted. Figure 3-3 shows a spin image generated at an ALSM point. Figure 3-3. Spin image genera tion. A) Neighborhood for spin image generation. The image is generated at the starred main point at a point cloud corresponding to a house roof. Squared points belong to the neighborhood of the main point used for computing the image. B) Spin Image. The bin size for this image is 0.5 m.


54 Although some filtering methods have used 3D neighborhoods, the features extracted from the neighborhood analysis results on single scalar values such as he ight or intensity variance or how much a maximum height or intensity deviates from an average value inside that neighborhood. The number of available featur es obtained through spin images on the other hand, is given by its number of pixels ( N N*). The spin images’3D nature is represented by those pixels that account for the whole geometric variability inside the neighborhood. The fact that each one of those pixels represents a featur e for classification, makes the spin images better suited for classification in de nse building/tree mixtures. The proposed approach for computing spin imag es has two main differences with respect to the way they are computed by Johnson (1997). First, instead of a tr iangulation mesh for choosing neighbors, a cylindrical neighborhood is defi ned for each point to be classified. Second, the definition of the local surf ace is done by fitting a plane usi ng a modified Hough transform of the neighborhood points instead of the matrix -based method used by Johnson (1997). Next, details on the modified Hough transform are given. 3.3.2 Tangent Plane Fitting through a Modified Hough Transform Algorithm Spin image generation and its usefulness to discriminate objects depends mostly on how well the tangent plane is fit to the local surface on which each main point is located. That surface can be estimated by a local nei ghborhood used for both fitting a local plane and computing the spin image. Using this neighborhood, various met hods can be used to fit a plane. Among them, least squares estimation, RANS AC (Random Sample Consensus) based algorithms and Hough transform can be used. Plane parameter estimation based on Least Squares is not robust in presence of outliers, a common situation when the neighborhood used for th e fitting of a point on a roof includes tree


55 points. On the other hand, RANS AC methods offer robustness to outliers. However, previous definition of initial parameters for RANSAC gives extra complexity to the process. In addition, sometimes when RANSAC is used for fitti ng planes, the number of points making up the different surfaces belonging to a neighborhood makes it difficult to estimate the best plane which the CP is on. That happens when a point on a flat surface such as a building roof is embedded in a neighborhood cloud where most points belong to canopy. Because of the majority of tree points on that local space, the fitting plane based on the ne ighborhood of that roof point is not the best due to the influence of non-roof points. Because of this disadvantage, also found in Least Squares method, a Hough transform-based method is used to fit the plane. The Hough transform was originally intende d to fit a line a bout points in a two dimensional space (Hough, 1962). This method has been extensively used for fitting not only lines but also other higher dimensional objects to measured data (L eavers, 1992). The method works by voting on the parameter space, where the parameters depend on the object to fit, for all the possible objects a point measurement can be on. For each point in the dataset, many votes for fitting objects are done. The object represented by the set of parameters receiving the most votes defines the best fitting. In ALSM data case, the parameter space is defined by the set of spherical coordinate parameters ( and ) related to the three Cartesian coordinates (z y x ,) by: Sin ySin Sin xCos zCos (3-2) where is the minimum distance of the plane to the origin, is the angle made by the x axis and the intersection of the plane x ywith a plane perpendicular to the x yplane and the fitting plane, and is the angle made by the x axis and the intersection of the plane x zwith a plane perpendicular to the x zplane and the fitting plane. These parameters are computed in a coordinate frame where the point neighborhood is circumscribed.


56 In the three dimensional Hough Transform de scribed above, each point in the point coordinate space generates votes in the parame ter space by computing and making discrete the value of for discrete values of and The votes are stored in a three dimensional matrix. Because in the general approach all the neighborhood points vote for different planes, there is no guarantee that the most voted on plane will pass through the CP. If the most voted plane passes through other point rather than the CP and its orientation does not follows the local surface on the CP, the local coordinate frame origin for spin image computation would be translated and rotated with respect to the optimal frame. This deviation in the frame would generate spin images where th e pixel values would be displaced to other positions, thus not capturing the right geometri c composition of the local neighborhood. That finally results in a poor feature set for training and clas sification purposes To enforce that the fitting plane contains the CP, we modify the general algorithm. A first modification would be as follows. First, the neig hborhood of the CP are sel ected and used to fit the plane. Then the main point is located at firs t position in the processing order list. After that, the CP votes all the planes passing through it in the parameter space. A logical true value is assigned to the corresponding cell for each voted pl ane in the matrix representing the parameter space. The remaining points are processed after, but they are only allowed to vote for planes already voted (with logical true value assigned). That re striction assures the most voted plane is one of the planes containing the main point. For the implementation above, voting in the para meter space is a tedious task that involves computation over all discrete values of and for each analyzed point. However, the definitive modification we used in the method includes a way to check only either one of those parameters


57 resulting on a big saving in computational ti me. Details on how this is accomplished are explained below. The Hough transform has been widely used for signal and image processing. Among the tasks on those fields, detection of linear featur es and fitting of parameterized curves to point samples have been carried out through this t echnique. In our algorithm, a modification of the transform algorithm allows us to select the plane that fits better a local point cloud and contains a specific sample point (CP). In the regular Hough transform, two dimensional points in feature space generate votes in a discrete parameter space. The rela tionship between the feature coor dinates and the parameters is given by ySin xCos (3-3) where x anyare the coordinates of a point and and are the parameters defining all the possible lines passing through that point in a parametric form. In that form, is the length of the normal from the origin to the line, and is the angle of that nor mal with respect to the x axis. Also, for constant values of and the set of (y x ,) pairs that meet (3) are on the same line. This is basically the definition of a line in parametric way. Figure 3-4 below show some points in the feature space and the corresponding curv es in parameter space for values of ranging from 0 to 180 degrees. As can be inferred from the figure, curve intersections in the parameter space define the parameters of the unique lines that contain the points in fe ature space represented by the intersected curves. When the transform is implem ented, the parameter space is represented by a two dimensional matrix whose colu mns and rows represent discrete and values respectively. This matrix has at first zero values in its cells. For each point, the whole range of quantized


58 Figure 3-4. Points in feature space and its corresponding mapping in the Hough transform parameter space. A) Three points in the feature space. B) Corresponding curves in parameter space. For any (x,y) point in the feature space, when is 0 the corresponding value is equal to the x coordinate of the point in the feature space. The curves are the result from applying the equation =xCos +ySin for every (x,y) point over the discrete domain. values is used to compute values that are quantized too. Th e cell values corresponding to each one of these discrete pairs are incremented (voted) stating that that point passes through the line represented by the cell. When several points are processed, cells with values greater than 1 indicate the presence of a line in the feature space, and the cell with the maximum value defines the line on which most of the points are. If we are interested in knowing the line that passes through a specific point, that point can be processed first and the cells incremented at th at time are marked with a logical true value. Then, those cells are the only ones allowed to increment when the remaining points are processed. A better method to find that line shows us that it is not necessary to examine all the values in the discrete domain for neither the first point nor the next processed points. The method is based on the fact that if we know th e parametric curve for the first point, the one described by the cells in the parameter space, we would just need to compute the intersection of that curve with the curves for the following points.


59 Looking at Figure 3-4, we see th at the parameter curves for any pair of points intersect in one point. Using Equation 3-3 for a pair of poi nts, the intersection in parameter space can be found. Setting the CP, having (0 0,y x) coordinates, as the first to be processed, and the remainingippoints with (i iy x,) coordinates for n i,..., 1 wherenis the number of points in the feature space, the intersection of the curves is found when the values are equal for the points. Using this fact, the value at which the curves intersect is given by ) ( ) ( ) ( ) (0 0 1 0 0 0 0 1 0 i i i i i iy y x x tg Sin y y Cos x x Sin y Cos x Sin y Cos x (3-4) Knowing the coordinates of the first a nd each one of the following points, the values (and hence values when replaced in Equation 3-3), wher e the curves in parame ter space intersect for points0pandipcan be computed without checking all the discrete values. When quantized, the corresponding cell in the matr ix can be incremented. For the three dimensional feature space, th e planes passing through a point are related by Equation 3-2 where x ,yandzare the coordinates of a point and and are the parameters defining the planes. Similarly to the two dimensi onal case, surface intersections in the parameter space define the planes to which the points in fe ature space that genera te the surfaces belong. The implementation requires a 3 dimensional matrix where each dimension correspond to quantized values of and Again, cells voted by the majority of points define the plane on which most points are on. When looking for the plane that passes through a specific point, it is not necessary for each feature point to iterate through all the discrete ( ) possible pairs (generally ranging from 0 to


60 180 for and from -90 to 90 for ). If we have the CP, having (0x,0y,0z) coordinates, as the first point to be processed, and the remainingippoints with (i i iz y x, ,) coordinates for n i,..., 1wherenis the number of points in the feature space, the intersection of the surfaces is found when the values are equal for the point pair. By using Equation 3-2 and the equality condition for we have: 0 ) ( ) ( ) ( .0 0 0 0 0 0 0 Cos z z Sin Sin y y Sin Cos x x Cos z Sin Sin y Sin Cos x Cos z Sin Sin y Sin Cos xi i i i i i i (3-5) Dividing Equation 3-5 by Cos we have: Sin y y Cos x x z z tg z z Sin y y Cos x x tg z z tg Sin y y tg Cos x xi i i i i i i i i) ( ) ( ) ( ) ( ) ( ) ( 0 ) ( ) ( ) (0 0 0 1 0 0 0 0 0 0 (3-6) According to this, by using the coordinates of the first and the ith point and iterating only through the possible values of the values (and hence values when replaced in Equation 32), where the surfaces in parameter space intersect for points0pandipcan be computed without checking for all the discrete values. When quantized, th e corresponding cell in the matrix can be incremented. By doing this, no ite ration through the possible values of is done and computational time is saved. The modified Hough transform, RANSAC-based and Least Squares methods were tested to produce spin images in training and classifi cation tasks. Best classification accuracy rates


61 were obtained when the fitting based on the modi fied Hough transform was used. Details on how the training and classification tasks are done are shown next. 3.3.3 Training Training is the stage in the supervised lear ning approach that allows us to acquire knowledge of the classes to compute posterior pr obabilities and hence cl assify the data. In training, parametric or non-parametric estimation of the probability density functions (pdf’s) of object features is done by using feature values of objects whose class membership is known at estimation time. There exist different methods for computi ng pdf’s. Maximum-likelihood and Bayesian parameter estimation are methods used when we ar e certain about the prob abilistic functions but the parameters that completely define the f unction are unknown (e.g. we know that the function is Gaussian but the mean and variance are unknown) (Duda et. al., 2001). Non-parametric methods are more flexible in the sense that no assumption is made about what function defines the probabilities, therefore the pdf can be esti mated without computing any function parameter. Parzen Windowing and k-nearest neighbor estima tion are examples of non-parametric methods. In our case, the training is done to estimate pdf’s for two classes: tree and building. The object features for each point belonging to the two training class are computed by using spin images. Each pixel of the image is an object fe ature. Therefore, if the spin images of size N*N, N being the number of rows and N the number of columns, are used for training and classification, we should consider a N*N dimensional feature space for pdf computation. Due to spin image size and non-stati onarity of the data, it is imprac tical to fit parametric function as the pdf. Given that, non-parame tric methods should be used.


62 Parzen windowing was not considered because of the high computational efforts needed when the feature dimension is large. Becau se spin image size, depending on the point neighborhood and spin image bin sizes, can usua lly range from 100 to 400 pixels for this application, it was decided to use k-nearest neig hbor estimation. When this kind of estimation is used, storing of the feature set for using it in the classification stage is required. Training samples are extracted fr om areas located on class site s for both classes. Once they are selected as representative samples of each training class, the method requires feature extraction through spin image computation for each one of them and image storing for subsequent classification of the te st data. The computation of feat ures requires three basic steps: neighborhood definition, local reference frame definition and spin image computation. The neighborhood definition is done by se lecting the points inside a cylinder of radius R and height R/2 centered at each point. Once the neighborhood is set, a three dime nsional local reference frame whose XY plane is the best plane fitting the point neighborhood is de fined. For building surfaces, that plane should be very similar to roof faces. To fit the plan e, the implementation of the modified 3D Hough transform is used. The last step for feature computation is spin image computation. The size of the image, and hence the number of features de pends on the values of the radius R and the bin size (bS). According to this, each spin image is a squa re matrix whose number of columns and rows depends on the parameters R andbS, and it is equal tobS R/. After training stage the set of feature values for each sample point in each tr aining class is stored to be used in the classification stage.


63 3.3.4 Classification In the proposed approach, classification implie s computing features fo r each point to be classified, calculating the probabilities of each point belonging to each one of the two classes, and according to this, making a decision about wh ich class to which the point belongs. Feature computation is done exactly the same as for the training stage. Calculation of probabilities and decision making is done through the k-near est neighbor method for pdf estimation. We obtain a 2-class, N*N-feature, classification problem, whereN*N is the size of the spin image used for feature computation. Th e definition of the number of features and the values each feature takes is a f unction of the data set point density and the particular spin image parameters. The k–nearest neighbor method was used inst ead of the Parzen windowing method because of the large data feature volume. In this approach, classifi cation implies computing features for each point to be classified and applying the k–nearest neighbor rule (decision rule) to assign a class label to each point. However, using the k–nearest neighbor, one does not explicitly estimate a pdf. Rather, one obtains a simple frequency of occurrence ratio. The posterior probability computed for each point is given simply by k k x p P x p x Pj j j j ) ( ) ( ) | ( ) | ( (3-7) where) | (x Pj is the posterior proba bility of the class j given a feature value x ,kis the number of neighbors used for estimation and jkis the number of neighbors in class j To apply the decision rule, the algorithm first searches the k nearest neighbors for the considered point in feature space in the stored features for the training data. Euclidean distances in the m*n feature space is used select the neig hbors. A C-code implementation of this search was used to achieve speed up over the original implementation. Th e tool is based on approximate


64 nearest neighbor searching and uses k–trees as data structures to store and search the data. Having the k neighbors, the decision rule is applied and the point is labeled with the class label corresponding to the most frequent class among the k neighbors. A key aspect for obtaining good classification resu lts is parameter definition for spin image computation. Pixel size of the spin image (bin size), angular resolution of the Hough transform and the neighborhood radius for th e Hough transform are the three parameters that must be specified. A sensitivity analysis on how the parame ters influence classification accuracy rates is detailed below. 3.4 Sensitivity Analysis Classification results are sensitive to the AL SM data characteristics and the spin image parameters used for training and classification. The most important characteristic of ALSM data for classification is the density of the data se t expressed by the median distance spacing between consecutive ALSM footprints. The spin image parameters are the radius R of the cylinder used to define the main point neighborhood, the bin size (bS) and the angle resolution (aR) used for voting in the modified Hough transform scheme. It is worth mentioning that a particular parameter set is used for both trai ning and classification stages. Looking at spacing resolution, it is expected that high resoluti on data should not need to use a big radius R in order to select enough neighborhood points to adequately characterize a local surface. However, there must be a minimu m radius value that enables to capture the geometry of the surface. Also, in order to define meaningful features, th e bin size has to be defined looking for cells able to summar ize the surrounding surface about each point. For classification purposes, it would be expected that the bin sizes be big enough to capture geometric variability and not leave many bins empty. Because each bin accounts for the number of 3D neighbor points at certain radius and heig ht range from the CP, bin sizes between point


65 space resolution and a fraction of that, maybe one ha lf or a quarter, could be expected to yield best classification results. For complexity time analysis we must notice that as the bin size increases, the number of features decreases. Howe ver, for classification results, the number of points counted in each cell and how these number re lates to the values on neighbor pixels are a more important fact. In addition to the parameters defining the number of pixels of the spin image, the angle resolutionaRplays an important role in both classifica tion results and complexity time, as the latter is explained in section 3. 5. It is expected that smalle r resolution in the Hough transform parameters results in a better plane fitting to the local surface, but in a slow time performance. According to this, aaRvalue that balances the tradeoff be tween execution time and classification results is to be found. To obtain that value we need to realize that asaRdecreases, the subsequent generated spin image based on the fitted plane has fewer changes with respect to the immediately spin image computed with th e previous bigger resolution. Analyzing those factors, a method to define thes e parameters as a function of the input data set must be defined to obtain better classifi cation results. A direct method, where several parameter sets are used for classifying the data could be used for that task. However that approach would not be effective in practical classification tasks. A much better approach that can be used in learning classificati on schemes is measuring the cla ssification accuracy by estimation of the mutual information (MI) between the cl asses and the features of the training data. Computation of mutual information is used as a method to select spin image parameters to improve classification results. The particularities of its use in a high dimensional feature space and the experiments showing how useful th is measurement is are explained below.


66 3.4.1 Spin Image Parameter Selection by Using Mutual Information Mutual information is a measurement from information theory that allows us to quantify how the uncertainty about a variable can be reduced from the knowledge we have of other variable. As such, many authors have proposed the use of MI to evaluate how good a set of training data is for classification. For this pa rticular problem, we use MI for measure how good a set of feature variables obtained from training he lps to classify ALSM da ta in the two values, building or tree, the class variable can have. When two labels are available for classifying, and the number of training data samples in each class is close to half the total number of samples, MI takes values between 0 and 1. A zero value means that no dependency between the feature variable and the class variable exists. If the va riables are independent, th e training data gives no help in reducing uncertainty about the class variable when an ALSM datum is to be classified. If the MI value is close to 1, on the other hand, it means that the training da ta gives clues to reduce uncertainty in the decision making, and the closer to 1 the MI value is, the better classification results should be. Because both the proposed classification appr oach and MI computation requires pdf or pmf (probability mass function) estimation for cla ss and feature variables, MI turns out to be a natural tool to assess the effectiveness of the training data on classification tasks. MI for a class variableCand feature variable X can be computed by (Cover and Thomas, 1991): ) | ( ) ( ) ; ( X C H C H X C I (3-8) where the entropy in Equation 3-8 can be computed by: Y yy P y P Y H ) ( log ) ( ) (2 (3-9) and the conditional entropy in Equation 3-8 is expressed by:


67 X xx X C H x P X C H ) | ( ) ( ) | ( (3-10) When the conditional entropy) | (X C His close to zero, it means that knowing X reduces greatly the uncertainty aboutC. Translating that fact to Equatio n 3-8, it results on a high value for MI. On the other hand, if Cand X are independent, knowing X has no effects on changing the entropy ofC.Therefore) | ( ) ( X C H C H and the value for MI in Equation 3-8 becomes zero. The above equations are intended to be used for discrete variables; therefore, the pdfs needed to compute Equations 3-9 and 3-10 turn ou t in pmf computations. That means that those equations are feasible for using th em in our problem if our variab les are discrete. In the case of classC, we just have two classes, building and trees ,BTCCC In case of X we deal with a feature set x in a space ofN*Ndimensions where each x is a spin image havingN*Npixels. Because each pixel has a discrete value, in theory those equations can be used. In the case of low dimensional features, Parzen window is a plausible approach for computing pdfs; however, in high dimensional featur es such as spin images, this computations turns out in a very computing demanding process. For that reason, approaches for very high dimensional feature space have to be used. An approach based on nearest neighbor st atistics is proposed by Kraskov et. al. (2004). They define two methods to compute the mutual information) ; (Y X Ifrom samples of points distributed according to so me joint probability density) (y x p. In contrast to methods based on binning of the feature space, they proposed a MI computation based on estimates of k-nearest neighbor distances. They enhance previous work done on entropy estimation from nearest neighbors. In those works, entropy) (X His estimated on the average distance to the k-nearest neighbor averaged over all x in X


68 In neural network implementations for classifi cation tasks, the best features from a high number of features have to be se lected to select the best subset and use them as inputs for the net. Mutual information has been proposed for sele cting the best subset (Battiti, 1994; Kwak and Choi, 2002; Tesmer and Estevez, 2002). The main idea of the proposed algorithms is to define a feature rank where the top features are the ones whose MI with the class is high and are not redundant with respect to the other featur es. The basic idea used here is that ) ; (Y X Ican be computed as an approximated value) ; ( ) ; ( ) ; ( ) ; (Ynr Yr I Ynr X I Yr X I Y X I where Y r and Ynrcorrespond to the set of featur es already ranked and unranked respectively. The set of nonranked features is tested itera tively using a derived condition of the equation above and the feature yielding the highest va lue is added to the rank. Implementations of the ideas above to com pute MI were tried to see how well these method could help on MI computation, but MI valu es for different spin image parameter sets did not agree with accuracy of cla ssification results. Possible reasons for that outcome can be the very high dimensionality of the feature space, and that the equations proposed by Battiti (1994), Kwak and Choi (2002) and Tesmer and Estevez ( 2002) just compute an approximated value of MI. The approximation is due to th e fact that the main purpose of those proposed methods is to select the best features, regard ing class separability, among the whole set instead of computing MI values per se. Therefore, we propose an alternative me thod to compute MI between the class variableCand the feature variable X when X is in a very high dimensional space by using Equations 3-8, 3-9 and 3-10. For our problem,Ccan take two values, buildings or tr ees. On the other hand, an instance of the variable X is represented by a spin image where such as image hasN*Nfeatures


69 corresponding to the number of pixels an image has. X is also a discrete variable because of the discrete domain of the values in each pixel. Therefore, to compute MI by Equa tion 3-8, we need to compute ) (C H and ) | (X C H. ) (C Hcan be computed by Equation 3-9 using the pmf estimation ofC. To compute pmfs for each needed variable, if the domain cardinality of the variable is much smaller than the number of samples, the probability can be computed just by counting the number of samples in each value of the domain (Jedynak a nd Khudanpur, 2005). In the case ofC, having just 2 classes, meaning the cardinality of the domain is 2, and a number of samples in the order of hundreds, this method can be used to com pute probabilities associated to C. According to this, the pmf for C can be defined by (),i in PCCi n (3-11) where 1,2 i for building or tree,inis the number of spin images available for classiin the training set, andnis the total number of images in the training set. ) | (X C Hcan be computed by Equation 3-10. In Equation 3-10, two terms have to be computed for each sample x (spin image) belongin g to the training set X :) ( x Pand) | ( x X C H This conditional entropy can be computed by applying Equation 3-9 so we would have: C cx X C P x X C P x X C H ) | ( log ) | ( ) | (2 (3-12) According to Equation 3-12, we need to compute) (building C P and) (tree C Pgiven a sample x Instead of computing these probabilitie s through a complex non-parametric method, we used the k-nearest neighbor ru le used for the final classifica tion step in the classification algorithm. When applied the rule to the local neighborhood of each sample x we have:


70 i k n x X C C Pix i ) | ( (3-13) where 1,2 i for building or tree,ixnis the number of spin images available for classiin the neighborhood of the sample x andkis the number of neighbors. Therefore, just by selecting theknearest neighbors to each sample x and counting the neighbors belonging to each class, these two prob abilities can be found and Equation 3-12 can be computed. The last step to compute Equation 3-8 is finding) ( x P in Equation 3-10. The same reasoning applied to compute P(C) could be applied; however, in this case, the cardinality of the domain is bigger than the numbe r of samples. For example, if x is a 20x20 pixel spin image, and each pixel can have 4 values, the domain size would be 4400. In addition, because of the high dimensionality, there is little chance two samples are equal, so basically, for each x the probability) ( x Pwould beN / 1, whereNis the number of samples. However, we propose to estimate this probability based on k-nearest neighbor values. Equation 3-10 can be seen as an expected value of conditional entropy. As such, a few values of) ( x Pare enough to compute a good estimate (Bis hop, 2006). In addition, we should make sure that the summation of) ( x Pover the training samples equals 1. To obtain this, we compute) ( x Pas a function of how dense the probabil ity function is in the local space where x is located. When a non-parametric method is used, we basically estimate ) ( x P by V N k p / / where kis the number of samples located inside a volumeVcentered at the comput ation point, in our case the sample location in the parameter space, andNis the total number of samples. That means that as the volume used to hold theksamples increases, the estimated probability value


71 decreases. Using the distance from the sample to thethkneighbor instead of computing a volume, we compute) ( x P as a function of the distribution volume by using distances: N i k kd NT d NT x P11 1 1 ) ( (3-14) wherekdis the distance of the sample to thethkneighbor,Nis the number of samples andNTis a normalizing term. As kd increases, the volume holding the samples also increases and the estimation of ) ( x P decreases. According to this, the method to compute MI between the classes and the features provided in the training set is as follows: LetCbe the class variable where} {2 1C C C being1Cthe building class and 2Cthe tree class. Let X a N*N-dimensional variable represented by spin images samples having N rows and N columns making up the training set. Let 2 1 2log ) (i i in n n n C H (3-15) denote the entropy ofC, computed by using Equation 3-11 in Equation 3-9, where, Equation 311 is the pmf estimation of C, inis the number of spin images available for classiin the training set, andnis the total number of images in the training set. For each spin image x sample of X let 2 1 2log ) | (i ix ixk n k n x X C H (3-16) be the conditional entropy ofCgiven the sample x computed by using Equation 3-13 in Equation 3-12, where Equation 3-13 is the pmf estimation of Cgiven x ,ixnis the number of spin


72 images available for classiin the neighborhood of the sample x andkis the number of neighbors; and n i k kd NT d NT x P11 1 1 ) ( (3-17) denote the pmf estimation of X in x as stated in Equation 3-14, wherekdis the distance of the sample to thethkneighbor,nis the number of samples andNTis a normalizing term. By replacing Equations. 3-16 and 3-17 in each term of the summation in Equation 3-10, and then this result along w ith Equation 3-15 in Equation 3-8, an estimation of the mutual information between C and X is obtained. The availability of a fast method to comput e MI between class a nd feature variables through processing of training data a llows us to test several predef ined spin image parameter sets that even can be dependent on data space resolution. If, as expected, high MI values produce high accuracy classification results, a check on wh ich parameter set gives better MI values will produce the highest classi fication accuracy. Moreover, if severa l training set areas are available, a more general parameter that works over broa der spatial area can be defined and used for posterior data sets with similar characteristics. The method explained above was tested over se veral sets of spin image parameters in selected training sites to assess what parameter se t is the best for classification purposes. Results of the tests are discussed below. 3.4.2 Sensitivity Analysis Results To test how MI computation can be a usef ul tool for deciding the best spin image parameter set for classification, several combinati ons of parameter sets were defined and used for training and classification over a portion of the suburban dataset ne arby UF Campus. This area is


73 a 100X150 m rectangular portion cont aining houses with trees locate d very close to the houses and that occlude, from a top view, parts of the hou se roofs. In this area, data points on two house roofs were used as training data. Also, from th e nearby tree canopy, points inside two areas were used for training over the tree class. Figure 3-5 shows a pictur e with the delimited area for classification and the training polygons inside that area. The parameter sets are the result of using several values for the spin image parameters: radius ( R ), bin size (bS) and angle resolution (aR). For the analysis, R takes the values 2,4,6,8 and 10 meter,bStakes the values 0.25, 0.50 and 1.00 meter andaRtakes the values 4 and 2 Figure 3-5. Suburban Gainesville area fo r classification of ALSM data (SW 34th St. and SW 2nd Ave.) Polygons delimitate training areas.


74 degrees. A value of 2 foraRmeans that, when computing the transform, the parameter space in the Hough transform has an increm ent of 2 for the polar angles and where ranges from 0 to 180 and ranges from -90 to 90. By combining the values of these parameters, 30 different parameter sets were defined. For each parameter set, spin images were comp uted for each point inside the training areas. Using the labels of the training areas, each spin im age defining the point features is related to a class (building or tree). Then, using the met hod discussed above, MI values are computed for each one of the spin image sets resulting of applying the different spin image parameter sets. Second column of Table 3-1 shows MI values for 15 parameter sets corresponding to several values of R andbSand a value of 2 foraR. Table 3-1. Correlation of mutual information (MI) values and classification accuracy rates (correctness percentage) for parame ter sets at 2 angle resolution R [m] MI Building CPa (%) Tree CPa (%) Total CPa (%) bZ = 1.00 m 10 0.854895.4089.76 92.58 8 0.823592.2591.25 91.75 6 0.770989.0491.21 90.13 4 0.691490.0088.34 89.17 2 0.528982.9482.57 82.76 bZ = 0.50 m 10 0.887996.5296.65 96.59 8 0.900194.8797.75 96.31 6 0.882194.4997.85 96.17 4 0.863394.8196.29 95.55 2 0.746187.7089.95 88.83 bZ = 0.25 m 10 0.899577.7599.86 88.81 8 0.922780.8699.88 90.37 6 0.930988.2999.80 94.05 4 0.916891.8299.12 95.47 2 0.836087.3894.93 91.16 Note: aCP = Correctness percentage


75 Looking at those MI values, parameter sets whose R values are greater than 2 m andbSvalues are either 0.50 or 0.25 m account for larger MI values. The median resolution of the point spacing of the training da ta is about 1 meter, which means that bin sizes in the order of one half and one quarter of the sp ace resolution gives best results. That suggests that for this kind of suburban data, bin sizes of about thes e space resolutions should be considered. In addition, maximum values are obtained for those curves about 6-meter radius for bin size of 0.25 m and 8-meter radius for bin size of 0.50 m. That shows that by tuning the spin image radius parameter for specific bin sizes, maximum MI values can be reached. However, because of the low upward and downward trends of MI values about the higher values, several selections of radius values can also result in hi gh MI values. That is th e case for MI when a 4meter radius and bin sizes of 0.25 a nd 0.50 m are selected as parameters. MI value computations we re also done for the same R andbScombinations but using a value of 4 foraR. The MI values were all slightly sma ller than the obtained for the previously usedaRvalue and showed the same trends. As expected, the MI values increase asaRdecreases. However, the MI difference seen in the analysis for the twoaRvalues can be useful when a tradeoff between the classification accuracy and exec ution time of the classification has to be obtained. To relate these measurements with classifi cation results, in addition to the training, classification of the area shown in Figure 3-5, us ing the approach proposed in previous section, was done for each one of the training spin image sets resulted from the use of the 30 parameter sets. Because the selected training points were inside of selected polygons from this area, a correspondence between MI values and classification rate accuraci es is expected. To compute


76 classification accuracy rates, ground truth was ob tained by carefully selecting point and building trees using pictures and visualizing the ALSM data itself. In Figure 3-6, MI values for each parameter set using the bestaRvalue, and the classification accuracy rates are correlated and disc riminated by bin size. One of the charts in this figure also shows correlation values for all the radii and specific bin sizes using the otheraRvalue. In Table 3-1, MI and accuracy rate s for building and tree classes are summarized for the bestaR value. As can be seen in the four plots in Figur e 3-6, despite some point clusters, a linear relationship between mutual information of the tr aining set for a specific spin image parameter set, and the classification accuracy rates for the area containing the training polygons can be inferred. The point location on the curves woul d probably show a smoother trend if a much larger set of training data, incl uding more building and trees, had been used. Apart from some MI values in the bottom left chart, larger MI valu es relate to larger accuracy rates when the total number of points, made up of building and tree points, were used to compute the rate. That shows that training parameter se ts resulting in higher MI values can predict a higher overall accuracy when the classification is done. Also, by looking at the plots, higher accuracy rates are obtained for classifying building points than the rates for tree points where the bin size was 1 m. On the contrary, tree points are classified with higher accuracy than building points for bin sizes other than 1 m. According to this, as bin size increases, within the ranges that yields high MI values, a better classification of building points is obtained with the cost of d ecreasing tree accuracy rates. With the decreasing of bin size, the contrary happens.


77 Figure 3-6. Correlation charts between mutual in formation (MI) values and classification rates. All the charts summarize MI values from parameter sets using a single aR and bZ value and the five radius values, and discri minate accuracy rates for building and tree points. And additional accuracy rate of the total number of point is shown. A) bz=1.00 m and aR=2. B) bz=0.50 m and aR=2. C) bz=0.25 m and aR=2. D) bz=1.00 m and aR=4.


78 Figure 3-6. Continued.


79 The slight variability of MI values observed in the second column of Table 3-1 for specific bin size and radius ranges, and the linear relationship between MI and accur acy rates, suggest that those parameter sets yieldi ng those higher values are the ones that can be used for obtaining high classification accuracy rates. Also, the sensitivity of building and tree accuracy rates to bin size just mentioned above can be used later wh en improving of building accuracy rates after a first classification is done. Figur e 3-7 shows classification results for a specific parameter set. This set is among the ones th at yield high MI values. Figure 3-7. Classification resu lts for bZ=0.25 m, R=4 m, a nd aR=2. Bigger dots correspond to points classified as building. Smaller point s correspond to points classified as trees. For this parameter set, MI=0.9168 and th e accuracy rates are 91.82% and 99.12% for building and tree points respectively.


80 3.5 Algorithm Complexity Analysis As mentioned in discussing th e sensitivity analysis, decreasing of the angle resolution (aR) results in higher accuracy results. However, that also increased the computing time. Therefore, to balance the accuracy of the result s with respect to computing time, a complexity analysis that describes how the time is affected by the different spin image parameters is needed. This analysis is done for the classification stage of the algorithm, once a f iltering of the data has taken ground points out. In the classification stage, for each non-ground point, three steps are carried out: First, a spin image is computed for the point. Then, the spin image is compared to the set of spin images computed at training by using a knearest neighbor algorithm. At th e end a class decision is made according to the number of neighbors bel onging to each class among the k neighbors. 3.5.1 Spin Image Computation For the first step the fo llowing sub steps are done: First, we need to look for the number of points inside the neighborhood of each point. The defined neighborhood is a cylinder of radius R and height R centered at the point. The search is done over all the point set which hasNpoints. To do this, a kd-tree data structure is used. In this pr ocess, making of a tree using theNpoints, and then a query over XY coordinates is done. The complexity time for this range search is ) (RP N Owhere RP is the number of returned points. Because a further search in the Z dimension takes) (RP O, the total complexity of this step is ) (RP N O. So the complexity of this sub step depends on the point density (r eturned points per area) and also on the radius R Once the neighbors are found, a Hough transform is applied to the neighborhood. So, for each returned point in the neighborhood, the cell s of a matrix where each cell represents an specific and angles is used to compute a value ( and are spherical coordinates).


81 Because and varies from 0 to 180, the angle resolution (aR) determines how many cells have to be checked. However, with the modified Hough tr ansform, just one of these angles has to be checked. According to this, the complexity time is )) / 180 ( (aR RP O. When the best fitting plane is found, the actual spin image generation is done. In this case, each returned point is assigned to the corres pondent pixel in the image. The spin image parameters R andbSonly accounts for storage complexity not for time. According to this, the complexity time for this sub step is) (RP O. To sum up, the complexity time for the first stage is ) (RP N O+)) / 180 ( ( aR RP O+) ( RP O. Rearranging we can have for this stage a ) / 180 (aR RP N Ocomplexity time. 3.5.2 K-Nearest Neighbor Search The second step for each generated spin imag e is looking for the k-nearest spin images from the training set. A approximateknearest search algorithm is used. It has a complexity time that depends on the number of training spin imag es. The algorithm first builds a data structure composed by kd-trees and box-decomposition trees The complexity time for this algorithm is) (logT Owhere T is the number of training spin images. 3.5.3 Classification The final classification time for each point is constant and proportional to the number of neighbors ( K ) used for classification. Accordi ng to this, the complexity time is) (K O. As defined for this algorithm, the value for K is N, therefore the complexity time is ) (N O.


82 3.5.4 Complexity in Time Keeping in mind the three stages of processing done for each point, the complexity time is: ) / 180 (aR RP N N N O+) log (T N O+ ) (N N O. Because the number of training spin images T is very small when compared to N, the complexity time is finally ) / (aR RP N N N O. Looking at the complexity equation, there ar e three things influe ncing the running time. One of them,N, depends on the nature of data. The bin si ze (pixel size) is important for storage size and classification resu lts but not for running time. RP depends on the search radius R that defines the neighborhood and the point density of th e data set. A bigger po int density results in more returned points. The other variable is the angle resolutionaRfor the Hough transform. This variable should be as low as possibl e to obtain the best fitting plane. On the other hand, if the point density becomes large and R Pdoes also, the radius R should be shortened to return enough and necessary points in the point neighborhood. This fact is actually seen in the sensitivity analysis. When doing that analysis, only a very few specific values of R depending on the point density of then data set, reach good accuracy results. 3.6 Results To test the effectiveness of the method, AL SM data extracted from the captured data described in section II was used for classificati on purposes. According to the selected areas, the scanned data can be classified in tw o kinds: suburban data and campus data. Suburban data are located ar ound campus boundaries and are ch aracterized by points from small and medium size houses and tall trees. The trees are located very close to the houses and in many occasions, from the sensor point of view, o cclude partially or totally house roofs. On the other hand, campus data are characterized by ta ller and larger buildings relative to the


83 surrounding vegetation. For the campus case, roof s are more complex than those observed in suburban landscapes. They also show more variable architecture and infrastructure on top of the roofs. For each one of these cases, selected areas for processing were chosen. Over these areas, training and classification stages were carri ed out. Classification accuracy results and particularities of each case are analyzed below. 3.6.1 Classification of Suburban Data For classification of suburban data, ALSM data covering houses and trees alongside 34th street, between 5th and 7th avenue were used. The area is delimited by a 250X350 m rectangle including houses and trees on both sides of the street. The training polygons and the area where th e polygons were delimitated are shown in Figure 3-8. Classification results over th e 25050 m area are shown in Figure 3-9. Figure 3-8. Suburban Gainesville area fo r classification of ALSM data (SW 34th St. between NW 6th Ave. and NW 7th Ave.) Polygons delimitate training areas.


84 Figure 3-9. Classification resu lts for bZ=0.25 m, R=6 m, a nd aR=2. Bigger dots correspond to points classified as building. Smaller dots co rrespond to points classified as trees. Due to occlusion, classification accuracy rates were not computed. Upper rectangle delimitates the training region shown in Fig. 8. Lower rectangle delimitates an area enclosing one of the house roofs partially o ccluded by tree points but detected by the algorithm. For the training stage, the spin image parame ter set was chosen according to the data and training done using MI as descri bed in section 3.3. This decision was made because of the building and tree similarities of th e area analyzed in that section with the area to be classified. The parameter set for this stage was: R =6m, aR=2 and bS=0.25m. The three rectangular areas


85 covering house roofs are used for training. T hose areas correspond to ga bled roofs surrounded by trees. The remaining polygons are the areas for tree learning. In the classification stage, the same spin image parameter set was used. The rectangular area was extracted from a scan strip taken in a no rth-south flight-line direction. As can be seen, boundaries of the scan area near th e left side of the plot in Fi gure 3-9. That means that laser pulses at those locations hit roofs and trees at large scan angles. Because many of the house roofs are partia lly occluded by trees, no computation of classification accuracy rates was done. However, because similarities of the area with the one analyzed in section 3.4, similar rates are expected. Despite the occl usion, all the buildings in the area were detected. Results show that very few tr ee points were classified as building, and some building points were classified as tree points in roof locati on with complex roof shape. Tree misclassification can be the results of sp atial point configuration that, when used for spin image generation, lead to small Euclidean distance with building samples in the parameter space. Building misclassification can be explaine d by the fact that in occluded roofs not enough points are available to compute the best fit plan e for spin image generation. The classification also can be affected by learning of no complex roofs in the training stage. An additional result of the classification is that despite of the occlusion, building points under canopy were classified correctly. As a conse quence of the ability of the laser pulses to penetrate vegetation and the side-looking capacity of the ALSM system with respect to some targets at large scan angles, pulse returns ar e obtained from house roofs located under canopy that otherwise would be partiall y or totally occluded from a t op view. In a suburban landscape as the one around campus, many houses can be in that situation. When those points are obtained and they are enough to describe the roof geomet ry, they are classified as building points.


86 One of the partially occluded roofs is shown en closed in the lower re ctangle of Figure 3-9. Most of the points on this roof, even those beneath the tree canopy, are classified as building. In Figure 3-10, more detailed plots of the house sh owing the spatial alloca tion of the points are shown. From the plots the extent of the overlap of the building and trees can be seen. The top view shows the building and tree points mixed in the XY plane, but correctly classified. Separating these points using classi fiers based on 2.5 dimensions would not be possible, but the points were correctly separated using the spin-image based clas sifier. The three dimensional allocation of the objects in the house-tree scene sh own in the remaining plots in Figure 3-10 can be captured by the spin images and used for classifying each point. 3.6.2 Campus Data Classification For classification of campus data, ALSM data of buildings and trees at south side of the football stadium were used. The area, delimited by a 120X85 m rectangle, includes building with complex architecture and trees of several species. The training polygons and the general area where the polygons were delimitated and the points were classified are shown in Figure 3-11. Classification resu lts over the area are shown in Figure 3-12. For the training stage, the spin image parameter set was chosen following MI criteria described in section 3.4. The pa rameter set for this stage was: R =6m, aR=2 and bS=0.50 m. The rectangular areas covering building roofs wher e chosen for training. Those areas encompass a variety of points on gabled roofs, dormers and flat surfaces common on campus buildings. The remaining polygons are the areas for tree learni ng and were chosen attempting to capture characteristics of different tree species that can be found on campus. Such variety of building and tree shapes requires a more extensiv e training that the suburban case.


87 Figure 3-10. Classified points nearby a partia lly occluded house roof. Filled dots correspond to points classified as building. Empty dots correspond to point s classified as trees. A) Top view showing many of the building point s mixed with tree points. B) Lateral view showing a portion of the roof and tree points above the house. C) Frontal view showing the shape of the roof and the vegetation above it. D) Perspective view of the house and tree points. E) Photograph show ing a piece of the roof and vegetation shown at the top center plot.


88 Figure 3-10. Continued.


89 Figure 3-11. UF Campus area for classificati on of ALSM data. Polygons delimitate training areas. In the classification stage, the same spin im age parameter set for the training was used. to quantify the accuracy of the classification, 10 buil dings in the classification areas were selected and their footprints were used to delimitate cl assified points. Those f ootprints were provided by the division of Facilities Planning and Constructi on of the University of Florida. The building footprints are shown in the Figure 3-12. As can be seen, these buildings have complex footprint shapes and some of them have complex roof archit ecture. The points to be classified are inside of the 120X85 rectangular area. For th at reason, the ALSM data set partially cover some of the 10 buildings. Numerical results of the classification of the ALSM points shown in Figure 3-12 are presented in Table 3-2. For each one of the se lected building areas and for the remaining tree area, the total number of points clas sified as building and tree are shown. Also, the percentage of the over the total number of poi nts in each area is shown.


90 Figure 3-12. Classification of area enclosing UF buildings. A) Building footprints for the classification area. Buildings in white co lor were chosen to test classification accuracy. B) Classification results for bZ =0.50 m, R=6 m, and aR=2. Bigger dots correspond to points classified as buil ding. Smaller dots correspond to points classified as trees. As can be seen in the table, the percentage of correctly classifi ed points over all the buildings is more than 90%. Ha lf of the buildings present a cl assification accuracy rate greater


91 than 95%. Those buildings are characterized mostly by having flat roofs and simpler architecture than other buildings in the classification set. Building roofs with many dormers and gabled roofs present a lower accuracy rate. On the other hand, the tree classification a ccuracy rate is about 85% and is 9% lower than the average classificati on rate for the building areas. This result agrees with the observed fact in the sensitivity analysis done in section V showi ng that larger bin sizes causes a decreasing in the accuracy rate of the tree class. Although the rates presented in Table 3-2 are accurate for most buildings, they can be slightly misleading for two factors. First, due to the scale at which the campus building footprints were delineated, the building footprints can not ex actly coincide with the actual footprints. This situation leads to actual building points endi ng up outside of the f ootprints affecting tree classification rates. Also actual tree points classi fied correctly can make building accuracy rates decrease if they are inside a building footprint. Becau se of the proportion of the points close to building footprints with respect to the total number of points in building and tree areas is small, the misleading in the accuracy rate should be also small. Table 3-2. Classification accuracy rates of campus data points at R=6m, bZ=0.50m and aR=2 Classification Areas Ba Tb %B %T Weil Hall 871055893.98 6.03 Reed Hall 9481298.75 1.25 Weimer Hall 11901120790.79 9.21 Nuclear Reactor (NRB) 24239796.15 3.85 Rhines Hall 455835692.76 7.24 MAE C 23455497.75 2.25 Gator Dining 65628298.77 1.23 North Hall 174910894.18 5.82 East Hall 5544093.27 6.73 C.W.P. #1 1160599.57 0.43 All Building areas 40910251994.20 5.80 All Trees areas 47602680415.08 84.92 Notes: aB: Number of points classified as building bT: Number of points classified as tree


92 The second factor affecting cla ssification accuracy rates is th e presence of infrastructure on top of building roofs. Unlike the suburban case, many buildin gs on campus present antennas, fans and other kinds of objects on their roofs. Be cause of their size, few points return from the objects and are above roof surfaces; hence their spin images are more likely to be classified as tree than building. Because vegetation is not common on building roofs, these points can be instead classified as non-building points leaving the actual building points classified as such. An example of this kind of building roof is shown in Figure 3-13. In this figure, several views of the point cloud from Weimer Hall desc ribe the complexity of the building and the objects on top of it. The top view shows some point s inside the building foot print that apparently are misclassified and should be building points. However, when a perspective view of the classified point cloud is available, many of thes e points raise above the roof surface. The reason for that is that those return s are from objects on the roof. Th e photograph in Figure 3-13 shows a lot of dishes and antennas on the building. The largest dish and their returned points can be clearly seen in the upper left part of the photograph and the pe rspective view respectively. Lateral and frontal views of the cloud also show non-building points at different levels from the building roof. In addition, the front vi ew shows non-building points, actually inside the building. Those points are returns from infrastructure located under the gabled roof seen in the upper left of the photograph in Figure 3-13. The ma terial used to make the roof allows some laser pulses go through the roof and hit objects under it. Because only so me points are returns from the roof, not enough points are available to be used by the local spin images when describing geometry around, and some of those point s are classified in the non-building class. By removing only the points from the la rgest dish and the dish next to that one, and the points inside the building, the classificati on accuracy rate for building class goes up to about 96%.


93 Figure 3-13. Classified points on a comple x building (Weimer Hall). The building shows infrastructure on its roof. Black dots corres pond to points classified as building. Grey points correspond to points classified as non-building. A) Top view showing the general footprint shape of the building. B) Perspective view showing several sections of the roof at different height levels. Many of the non-building points are actually above of the roof surfaces and can be re turns from roof-top infrastructure. C) Photograph showing the roof of the building and a set of dishes and antennas on it. D) Frontal view showing that many of the non-building points, at left, are from the infrastructure on the roof. In addition, non-bu ilding points, at right, are the result of the transparent glass gabled roof, seen in the upper left side of the photograph, that allows laser pulses go through that surface. E) Lateral view showing at left two dishes and also returns from antenna s close to the gabled roof.


94 Figure 3-13. Continued.


95 Another complex building is Weil hall. Although this one does not have a lot of objects on top o its roof, gables and dormers as well as clos er trees make the classification a harder task. Figure 3-14 shows a graph of the classified points and photographs of the building. Figure 3-14. Classified points on a complex buildi ng (Weil Hall). For the plot at left, black dots correspond to points classified as no-bu ilding. Grey points correspond to points classified as building. For th e plot at right, bl ack dots correspond to points classified as building and grey points correspond to points classified as no-building. A) Perspective view showing the general shape of th e building. Some of the no-building points are actually returns from walls B) Detailed perspective view of the building roof. Returned points from stai rs and a tree next to them, seen also in the photograph in this figure, are correctly classified. C) Photograph of the roof of Weil hall. D) Detailed photograph of the acc ess door and stairs at the north side of the building.


96 Figure 3-14. Continued. As can be seen in the plot, most of the poi nts were correctly classified despite their location in complex parts of the roof. Like the journalism building, the classification rate is affected. In this case the rate is misled by the fact that some points that were not classified as nobuilding are not actually on the roof but are part of the walls. In addition, trickier building features can be classified. Th e plot in Figure 3-14 B shows that the stairs giving access to the building also were classified as being part of the building though their points are not on any roof and a tree is just by them. Considering the observed fact that decreasing of the bin size improve s tree classification rate at the expense of lowering building classification rate, classi fication of the area using a spin image parameter set replacing the bin size for a smaller one was done. For the new classification, a bin size of 0.25 m was used. The results for this parameter set are shown in Table 3-3.


97 Table 3-3. Classification accuracy rates of campus data points at R=6m, bZ=0.25m and aR=2 Classification Areas Ba Tb %B %T Weil Hall 7677159182.83 17.17 Reed Hall 8758591.15 8.85 Weimer Hall 10270283878.35 21.65 Nuclear Reactor (NRB) 226925190.04 9.96 Rhines Hall 437653889.05 10.95 MAE C 229010995.46 4.54 Gator Dining 620344193.36 6.64 North Hall 169116691.06 8.94 East Hall 5227287.88 12.12 C.W.P. #1 11075895.02 4.98 All Building areas 37280614985.84 14.16 All Trees areas 1509300554.78 95.22 Notes: aB: Number of points classified as building bT: Number of points classified as tree As can be seen in the table, when compared to results in Table 3-2, building classification rates decrease as many as 10 % and the total tree rate increases as many as 10% up to about 95%. Other than Weimer and Weil hall, the more comp lex buildings, all buildings show accuracy rates above 87%. The overall accuracy rate for this case is 90.53% that is very close to the overall rate of 89.56% obtained for the larger bin size. Because each of these two classifiers shows a stronger performance at one different class, a combina tion of these two classifiers could improve the overall classification rate. When the class decision for each point is made in the classification algorithm, the posterior probability for each class is computed, and the largest probability value defines the class label. This value is stored along with the class label at decision stage. Taking advantage of this, the combination can be done at the results level. This means that once the results of each classification are obtained, those results are processed to obtain a new classification. According to this, for each one point to be classified, two labels and their corresponding posterior probability values are available. Th en just the label corre sponding to the larger


98 probability is assigned to the points. The classifi cation results obtained by applying this rule are shown in Table 3-4. The results for the combination of classifier s show that, except for Weimer Hall, the accuracy rates for all the buildings are above 90%, and, in general, are closer to the larger rates obtained for the first classificati on than those obtained with sma ller bin size. On the other hand, the accuracy rate for the tree ar ea is equidistant to the lower rate for the larger bin size classification and to the larger one for the second classification. The overall classification rate is 91.37%, roughly 1% and 2% higher that the over all rates for the two single classifications. After points are classified a nd building points are obtained, a further classification over those points regarding the kind of roof each point is obtained from can be carried out. Training the algorithm with two ki nds of areas covering gabled and flat roofs respectively, classification over the Weil Hall roof is done and its results are shown in Figure 3-15. As can be seen, the majority of points were classified co rrectly in those from flat surfaces and the ones from tilted surfaces. Table 3-4. Classification accura cy rates of campus data points for combined classifier Classification Areas Ba Tb %B %T Weil Hall 849976991.70 8.30 Reed Hall 9431798.23 1.77 Weimer Hall 11664144488.98 11.02 Nuclear Reactor (NRB) 241810295.95 4.05 Rhines Hall 452139392.00 8.00 MAE C 23396097.50 2.50 Gator Dining 651213298.01 1.99 North Hall 173911893.65 6.35 East Hall 5514392.76 7.24 C.W.P. #1 11491698.63 1.37 All Building areas 40335309492.88 7.12 All Trees areas 32002836410.14 89.86 Notes: aB: Number of points classified as building bT: Number of points classified as tree


99 Figure 3-15. Classification of points on flat a nd gabled roofs in Weil Hall. Filled dark dots correspond to points on gabled roof or dormers. Empty dots correspond to points on flat surfaces. The high classification rates even could be im proved with the fusion of ALSM data with ground based lidar data. Spin images are well su ited to capture 3D ge ometry at any spatial resolution; therefore the same cl assification method can be applie d to ground lidar and the fusion of those kinds of data. Results of the classi fication of fused data are shown by Caceres and Slatton (2007b). Supervised learning as the ba sis of the classification met hod, and the use of raw data alongside with spin images as the point feature descriptor proved successful when separation of building points from other cla sses eventually spatially overlap ping the building class was done. Trough the use of a modified algo rithm for Hough transform pro cessing, robust and fast plane detection is allowed as a previ ous step for spin image generation. Regarding spin image quality for classification accuracies, its best generation parameter set could be defined optimally by using a Mutual information-based algorithm. Al, th ese aspects, and the fact that automation can be achievable with previous training, real time processing is feasible.

PAGE 100

100 CHAPTER 4 IMPROVED CLASIFICATION OF BUILDING INFRASTRUCTURE FROM AIRBORNE LIDAR DATA USING SPIN IM AGES AND FUSION WITH GROUND BASED LIDAR Modern ALSM systems can routinely provide data point densities of 1-10 points per square meter, depending on the system configur ation, flying altitude, and overlapping flight lines. The position accuracy on the points is ro ughly 15-20cm in xy and 5–10cm in z (Shrestha et. al., 1999), which is sufficient for terrain ma pping and general building detection. However, when recognition of infrastructure on buildings and modeling of building details are required, high resolution range data, such as the data pr ovided by ground-based laser scanners is used. Ground based systems can provide points at high spatial resolution with sub-centimeter position accuracy. The work presented here focuses on improve d classification of ALSM point clouds in building and non-building points usin g the spin image-based method explained in Chapter 3. The data are fused with ground-base d laser scanner data to improve discriminability among complex building architectures, adjacent vegetation, and even rooftop infrastructure. Below, a short review of methods for segmenting ground-based sca nner data is presented. Then, details on the preprocessing and fusion of the data are given. Fi nally, results of the processing of the fused data and conclusions are explained. 4-1 Review on Processing Methods for Ground-Based Scanner Data Like ALSM data, ground-based laser scanne r data has been processed in the Image Processing and Computer Vision field for segmenting, classify ing and extracting features. Belton and Litchi (2006) develo ped a covariance based proce dure to perform classification and feature extraction for terres trial laser scanning point clou ds using local neighborhoods. A method to classify points by using the varian ce of the curvature in a local neighborhood combined with a region growing algorithm is used to segment the surfaces Rabbani et al. (2006)

PAGE 101

101 present a method for segmenting point clouds using a smoothness constraint, which finds smoothly connected areas in point clouds. The process has two steps, normal estimation and region growing. Pu and Vosselman (2006) show an approach to automatically extr act building features from terrestrial laser scanned data by proces sing the point cloud with various segmentation algorithms. Here, the planar growin g algorithm consists of two step s: determining a seed surface and growing the seed surface. Then, consid ering human (expert) prior knowledge about buildings and assuming that buildi ng features resemble planes, seve ral constraints are defined for size, position, directi on and topology. A more general appr oach for registering of ground-based scanner data is given by Rabbani et. al. (2007). In their work, unlike most methods, modeling of data is done as a previous step to the registration. Portions of the point cloud are fitted to predefined and basic geometric primitives. Models found on more than one data set are used for registration. 4.2 Data Acquisition and Pre-Processing The data for this work is provided by the Geosensing Engineering and Mapping (GEM) Research Center at the University of Florida. The data itself was taken over Weil Hall at the University, which was chosen because it exhibits several different kinds of roof morphologies and is encircled by numerous trees, thus making it a difficult test case. The ALSM data was captured by the Optech 1233 unit described in Ch apter 2. The ground-based laser scanner is an Optech, Inc. ILRIS-3D unit. It is mounted onto a 10 m telescoping boom in the back of a truck so that it can easily acquire ranges to urban building faces from a variety of viewing angles. It uses a 1535 nm wavelength pulsed laser capable of produc ing point clouds of disc rete ranges from 3 m to 1500 m for targets with 80% reflectivity at a laser pulse rate of 2,000 Hz. Depending on the angular separation between adjace nt pulses and the distance to th e target, point spacing from a

PAGE 102

102 few millimeters to centimeters can be obtained. When mounted in this manner, the system is referred to as the Mobile Terrestrial Laser Scanner (M-TLS). By positioning the ILRIS unit on the roof of a nearby building, it was possible to illuminate portions of the roof of Weil Hall. Thus, we could compare the advantages of groundbased data over buildings faces separate from ground-based data over buildings roofs. Before the M-TLS data can be fused with the ALSM and analyzed, it must be preprocessed, so that multiple scans can be merged into one scan. Below, we give more detail on this process. The Optech ILRIS 3D ground-based la ser scanner is used to collect point clouds for the sides (walls or faces) of buildings. The build ing structures were scanned frame by frame with a mean point spacing of about 10 cm. Through alignment process, these laser sets were combined to form a whole scene of a building. In addition, a GPS survey accompanied the laser scanning to provide ground c ontrol information for the coordinate transformation. PolyWorks software was used to align and me rge the data sets. The software provides an interactive mode for manually aligning 3D data, so users are required to sele ct at least three tie point pairs between two data se ts. The data alignment technique is done through an iterative algorithm that computes an optimal alignment by minimizing the 3D distances between surface overlaps in a set of 3D data. The next step is to align the M-TLS point cloud to the ALSM point cloud to geo-reference the M-TLS data. The diffe rences between them are translations and rotations. To do that, four GPS receivers were set into the scanned area to collect geographic coordinates, so these receivers also were scanned as laser point s. After getting the control point coordinates by manually measuring the tripod top, 3D translati on and rotation transformations are computed using Terrascan software. These tran sformation parameters were used to reference the data. Figures 4-1 and 4-2 show a picture of the scanned area and the fused data respectively.

PAGE 103

103 Figure 4-1. Weil Hall at the University of Florida Figure 4-2. Point cloud of fuse d ALSM and M-TLS data corres ponding to area enclosed by dahed line in Figure 4-1. Highest points are in red. Lowest points are in blue. 4.3 Classification Results For all the classification described in this ch apter, the method explained in Chapter 3 was used. Method parameters such as angle resolu tion (aR), bib size (bS) and neighborhood radius (R) were defined according to the sensitivity anal ysis done in Chapter 3. Parameter values of

PAGE 104

104 bS=0.25m, aR=2 and R =5m were found to yield effective classification of ALSM data, MTLS data, and the fused data sets. Classification of three data “types”, ALSM data, M-TLS data, and fused ALSM and MTLS data, was performed on data captured ove r the University of Florida campus. Before classification, preprocessing and training on the data are required. The ALSM point cloud is pre-processed by filtering out ground points by using a commercial software package known as MARS fo r processing point cloud data (Merrick and Company, 2007). Then, a single f ile containing both first and last returns from the non-ground points is ready for processing. Algorithm traini ng is done by using the three data “types” available. For ALSM data and fused data, trai ning was done over an area around the Weil Hall. In this context, training is done by manually selecting and labeling tr ee and building points. Figure 4-3 shows the area and the point sets were training was done. Figure 4-3. Training areas in Weil Hall. Elevations are color-coded. All units are in meters. Black polygons indictate trai ning data. Ground points have been filtered out, so that only building and vegeta tion classes remain.

PAGE 105

105 4.3.1 Classification of ALSM Data For each point in each class set, spin images are comput ed and stored for posterior classification. Figure 4-4 shows classification resu lts for ALSM data. In this figure, green points correspond to tree class, and blue points correspond to building class. The classification accuracy depends on the choice of spin image and Hough tran sform parameters. Overall, the classification results are good (see Table 4-1). For the results in the table, R =5 meter and aR= 2 degree. 4.3.2 Classification of M-TLS Data For M-TLS data alone, and due to that the em phasis of the project is on classifying fused data, only a few tests were done. Optimum parameter values for spin images and Hough transform fitting mentioned above were used. Figu re 4-5 shows classification results for M-TLS data. The average space of the M-TLS points is about 10 cm. 4.3.3 Classification of Fused Data For classification of fused data, several comb inations of training data and data to be classified were tested. Because classificati on based on ALSM data alone was done before, classification of fused data was performed to see how good the segmentation results were compared to ALSM data. This classification was done using training in formation from ALSM data and from the fusion of ALSM data and M-TLS data taken around Weil Hall area. To assess how much and what kind of improve ment is achieved when fusing the data, a segmentation of the M-TLS data that separa tes points on walls from points on roofs was performed. Points on walls are expected to help classification tasks near roof edges; whereas points on the roof are expected to improve classification results at in terior roof ridges. Therefore, two fusion procedures were done. One of them fuses the ALSM data with M-TLS data wallpoints. The other fuses M-TLS roof-points with ALSM data. Counting the ALSM data alone, three data sets were available for classification (see Table 4-1).

PAGE 106

106 For fused data, the training was done over the same training areas for ALSM data alone (see Figure 4-3). This traini ng was done only for wall M-TLS data fused with ALSM data. Training on roof M-TLS data fuse d with ALSM data was not cons idered because of redundancy. Classification of this kind of fused data using this training data would be ve ry similar to classify ALSM data using training data on ALSM data al one, but at higher resolution. Accordingly, two training sets (including training over ALSM data) were used for our classification experiment. Figure 4-4. Classification result s for ALSM data. Data shown is located around Weil Hall.Green points correspond to tree class, and bl ue points correspond to building class. Figure 4-5. Classification result s for M-TLS wall points only. Green points correspond to tree class, and blue points corres pond to building class. The wall in the picture belongs to the north part of Weil hall

PAGE 107

107 Five experiments were carried out to com pute classification accuracy under different acquisition scenarios. For all the experiments, a 0.25 m bin size was chosen. Because of the heavy computational effort requir ed to process very high resolution data, a small area of campus is studied. The chosen area, a rectangular subset inside the Weil Hall footprint, is shown in Figure 4-6. The dimensions of this small te st area are 15 m northing by 83 m easting. In the first experiment, cla ssification and training were done on ALSM data alone. A top view showing the rectangular area for classificati on is shown in Figure 4-6. In this experiment, about a 98% of the building points were correctly classified. In all 5 experiments, values of R =5m, aR=2 and a 0.25m spin image pixel size were used. For a summary of classification results see Table 4-1. Table 4-1. Classification accuracy percentages as a function of trai ning data sets and fused data combinations Expmt. Train. Fused Data B NB T NT 1.a T1 ALSM 98.671.3385.40 15.60 2.a T2 F1 92.937.0720.28 79.02 3.a T2 ALSM 87.8012.20100.00 0.00 4.b T1 F2 99.970.0376.34 23.67 5.a T1 F1 99.330.678.48 91.52 Expmt.: Experiment number. Suffix ‘a’ denotes case s where the testing da ta consisted of 3450 building points and 226 tree points. Suffix ‘b’ denot es cases where the tes ting data consisted of 11409 building points and 226 tree points. Testing sets of different sizes re sulted from inclusion of different ILRIS data sets. Train.: Training sets. T1 stands for training done on ALSM data alone. T2 stands for training done on the fusion of wall M-TLS points and ALSM data. Fused Data: Point cloud data used. ALSM stands for AL SM data alone. F1 stands for the fusion of wall M-TLS points and ALSM data. F2 stands for the fusion of roof M-TLS points and ALSM data. B: Building points classifi ed as building points. NB: Building points classifi ed as tree points. T: Tree points classified as tree points. NT: Tree points classified as building points.

PAGE 108

108 Figure 4-6. Classification resu lts for Experiment 1 superim posed on architectural building footprints of Weil Hall and surrounding buildin gs. Building points correctly classified are in red and misclassif ied points are in black. In the second experiment, wall M-TLS data fused with ALSM data is classified using training done on the fused data. The purpose of th is experiment is to compare classification results with ones obtained by usi ng ALSM data alone. Classificat ion results are not better than those obtained in Experiment 1. This result ma y seem counter intuitive because there are more samples on the buildings. However, the point density on the building walls sampled by the ILRIS unit is much higher than that of the roofs sampled by the ALSM in this case. For considered points near, but not on, the roof edge, it is possible for the large number of wall points to bias the plane fitti ng by the modified Hough transform during the training process. Also, for considered points near the roof edge s, the pixel values in the spin images that correspond to the wall will be biased upwards dur ing training due to the high point counts. These effects in turn cause some misclassificat ion of building points near roof edges in this experiment. Some misclassified points are loca ted where no training data were taken, which suggests that more training data of these walls is warranted. In the third experiment, ALSM data is classi fied using training on fused data. The building classification accuracy was the worst among the e xperiments (see Table 4-1). The reason for that

PAGE 109

109 can be explained by the fact that wall informa tion presented in training it is not in the classification set; hence roof point s close to wall information are misclassified as tree points. In this particular case, more tr aining data should be taken. In the fourth experiment, roof M-TLS data fused with ALSM data is classified using training done on ALSM data alone The purpose of this experiment was to quantify how much improvement is obtained when additional inform ation from building roofs is available. A perspective view from the north of this classifica tion is shown in Figure 4-7. Accuracy results are shown in Table 4-1. The roof classification wa s the best among the five experiments. From points on roofs, only 2 out of 11406 poi nts were incorrectly classified. The fifth experiment is very similar to th e forth one. Wall M-TLS data fused with ALSM data are classified using tr aining done on ALSM data alone. As observed in the fourth experiment, the points on walls helped to improve classification of buildi ngs. A perspective view from the north of the classifica tion results are shown in Figure 48. Accuracy results are shown in Table 4-1. Figure 4-7. Classification result s for Experiment 4 (perspective view from north-west side.) Building points correctly classified (blue) and misclassified (green points)

PAGE 110

110 Figure 4-8. Classification result s for Experiment 5 (perspective view from north-west side.) Building points correctly classified (blue) and misclassified (green points) In terms of building detecti on accuracy, Experiment 5 was the best, with Experiments 4 and 1 achieving accuracies almost as high. In term s of tree detection accuracy, Experiment 3 was the best, and Experiment 1 was second best. Averaging over building and trees, Experiment 3 was the best, with Experiment 1 a close second best. Thus, applying the spin image segmentation to ALSM data only (Experiment 1) gave reas onably good results overall. Fusing ALSM with MTLS data during testing improved building segm entation, but fusing ALSM with M-TLS during training did not help bu ilding segmentation. 4.3.4 Roof Classification Once we have successfully separated buildings from trees, we desire to further classify different roof segments and roof types. For exam ple, when using ALSM data alone, we generally do not have sufficient informati on about the building si des to robustly segmen t roof edges that are near precipitous drop offs. This ambiguity occurs often with multi-level contiguous buildings. An example of this staircase roof c onfiguration on Weil Hall is shown in Figure 4-9. By fusing ALSM and M-TLS wall data and using th e same spin image operations, we are able to

PAGE 111

111 separate roof edges near danger ous drop offs from those near st eep, but less precipitous, drops (see Figure 4-10). Another application of the spin image based segmentation is the discrimination of different types of roofs. This classification was done on AL SM point data alone (without fusion) to show the robustness of the spin image segmentation. After separation of building points from tree points, a classification of roof points was done using the same sp in image operations. By training the algorithm with points from dormers located on the roof of Weil Hall and points from flat surfaces, two classes were learned by the algo rithm. Classification results over Weil Hall are shown in Figure 4-11. It should be noted that ge nerally such discriminati on would require direct plane fitting or specialized sear ches for slopes using empirical th resholds. Both of those methods can be overly sensitive to incidental height vari ations due to air conditioners, vent pipes, and short edge walls. In our case, this segmentation was made without having to alter the spin image operations, thus demonstrating the robustness of the method. Figure 4-9. Flat roofs on Weil Hall and a nearby building

PAGE 112

112 Figure 4-10. Roof points from the enclosed areas in Figure 9. Roof edges near large drop offs are in green. Flatter roof segments and edge s near shorter drop offs are in blue. Figure 4-11. Classification of angl ed and flat roofs over Weil Hall. Flat sections are in green. Angled sections are in blue. From the results, it can be seen that the fu sion of M-TLS roof and wall data with Lidar obtained the best performance due possibly to th e additional geometric information and a best fitting of the planes for spin-image generation. More importantly, it was realized that training with M-TLS data can be bypassed without de grading the quality of final results.

PAGE 113

113 The spin-image based method proved successful at different spatial resolution levels of the Lidar data; however along those levels, the select ion of the best parameter set for generating the spin images is a critical factor aff ecting the quality of the final results.

PAGE 114

114 CHAPTER 5 AUTOMATIC BUILDING FOOTPRINT DELINEATION FROM ON-EDGE RETURN IDENTIFICATION IN LIDAR DATA 5.1 Introduction Delineation and geo-referencing of building foot prints constitutes an important information source for urban planning, military strategy, civi l engineering, and fields where location of infrastructure is an important component. Ae rial photographs, satellite imagery, and more recently airborne Lidar data have been used for the extraction of these boundaries. Traditionally, the recognized advantages of Lidar are its abil ity to operate at night and reduced shadowing (since the illumination comes from near zenith). Ho wever, the spatial point density of Lidar has typically meant that it cannot represent sharp feat ures (e.g. building roof edges) as precisely as aerial photographs. Recent advances in Lidar te chnology are mitigating that edge distortion. The work proposed here aims at detecting and extracting building footprint boundaries in an automatic way from raw Lidar data. The met hod is based on statistical segmentation and two dimensional (2D) line fitting. It is intended to yield locations as accurate as those obtained from photogrammetric methods. In the following sections, a review of the literat ure on this topic is presented, and an explanation of the proposed method is given. 5.2 Literature Review Many strategies working on seve ral types of data have been tested to define building footprints. Photogrammetric techniques, digital image processing and processing of point clouds have been tried over low and high resolution ph otographic and Lidar data. Below, different algorithms and techniques are revi ewed and classified according to the data source they use. 5.2.1 Photograph-Based Techniques Photogrammetry has been the most common means by which the extraction of building boundaries has been carried out. After photograph processing to obtain internal and external

PAGE 115

115 orientation of stereo-models, footprints and othe r topographic features, can be manually or semiautomatically delineated by digitizi ng them on physical or digital pictures using a digitizing table (Wolf and Dewitt, 2000). When digital pictures are available, monocul ar processing by digita l and computer vision techniques are used. Segmenting pixels on build ing or non-building cla sses to obtain building edges, or directly extracting boundaries represented by linear or curved features (Huertas and Nevatia, 1988), are two of the most used techni ques for building footpr int delineation. Other schemes allow complete automation of the process. In the work of Shufe lt (1999), a performance review of automatic algorithms that process a unique image is presented. In general, those methods assume, after detecting linear features, that building footprint sh apes are composed of rectangular portions that are comb ined to obtain the final shape. A more complete study focused on the detection problem, instead of the extract ion problem, is found in Lemmens’s work (1996). Many proposed methods for footprint delineat ion rely on photogrammetric techniques, based on stereoscopy, used for 3D building modeling. Some methods have implicitly defined building footprint delineation as one of their processing steps. Ot hers treat the delineation as a secondary result. User interacti on is required sometimes whether fo r selecting initial regions or digitizing points. A few of them attempt to automate the whole process. By using height information besides radiomet ric values, Oriot and Michel (2004) detect footprints for simple buildings using a Hough transform that looks for rectangular shapes. Active models are used for irregular building footprint delineation. The method is semiautomatic in the sense that users select a portion insi de the building to be delineated. In the work of Weidner and Forstner (1995), the footprint delineation is the last step in a process where first a DSM is obtained through ster eoscopy, buildings are then detected by using

PAGE 116

116 both thresholds and the fact that buildings are higher than the surface around them. Once buildings are detected, a model-driven approach is applied to fit parametric shapes to the detected building areas. Gabet et al. (1997) show a similar appr oach where a building DEM is extracted from the initial DSM and the obtained footprints are manually corrected. Automatic processing of data has been a heav ily pursued goal. In Gruen’s work (1998) and the work of Gruen and Wang (1998), three-dime nsional points on roofs are manually detected from a stereo model. Edge, gable and eave points are used to delineate a building model. Then, an automatic process fits a roof model from a database. From the models, footprints can be extracted. Zebedin et al. (2006) propose an almost automated method, although with some restrictions, for photogramme tric building modeling. Pattern recognition approaches have been used also on the problem. Brunn and Weidner (1998) use DSM and Bayesian networks to det ect building regions. Then, using information about normal directions on roof poi nts, filtering is carried out to obtain step and crease building edges. Samadzadegan et al. (2005) show how phot ographs are processed by using neural networks and fuzzy reasoning over structural, spect ral and texture information. Building regions are then modeled in three dimensions by fitting points, polygons or volumes to the extracted regions. Not only conventional aerial photographs have b een used for 3D modeling. Fraser et al. (2002) use a stereo model for imagery from the Ikonos satellite to extr act buildings. Comparison of the extraction processing quali ty with the one obtained from a conventional stereo-pair is presented. Planimetric and height accuracies of building features as well as number of detected buildings are compared in the study. Those kinds of measurement accuracy are computed when ground truth is given by GPS measurements.

PAGE 117

117 5.2.2 Methods based on data fusion Because of the availability of Lidar and SA R data, some attempts have been made to combine these types of data with photographs to obtain three dimensional urban models. Haala and Brenner (1999) use spectral in formation from infrared-color data to segment areas. Then, by using a DSM from Lidar data, geometric informa tion is used over the segmented building areas to obtain the models. In a similar way in Mc Intosh and Krupnik’s work (2002), building edges are extracted from aerial imagery and then co mbined with the Lidar-based DSM and DTM to refine the three-dimensional information to thos e building features. The 3D edges are extracted from photogrammetric processing. Accuracy comparison with ground truth measurements, extracted from conventional photogrammetric me thods, is also given. Tupin and Roux (2003) use sequential SAR imagery and optical photographic data to build the outlines. First, linear primitives are extracted from SAR data. Then, rect angular shapes at first instance are looked for on the optical data from the SAR feature locat ions by using edge detectors. When complex footprints are found, corners are searched instead of lines in rectangular positions. Using only SAR observations has also proved feas ible for building extraction. In the work of Simonetto et al. (2005), hi gh resolution SAR imagery is used to reconstruct 3D buildings. Footprints are obtained from H ough transform-based processing, and stereoscopic measurements help on computing heights. Xu and Jin (2007) ex tract 3D cuboids representing building shapes from high resolution multi-aspect SAR imagery. A Hough transform for recognizing parallel segments and a matching algorithm for recognizing di fferent aspects of a building are used in an automatic scheme. Due to the advantages Lidar data offer when capturing geometric information, they have been fused with other kinds of data to attack the problem.

PAGE 118

118 Ikonos imagery and Lidar data, previously ge o-referenced, are fused by Sohn and Dowman (2007) to extract buildin gs. After calculating a Lidar-based DT M, height information from it and Ikonos-based normalized difference vegetation i ndexes (NDVI) are used to detect building points from the Lidar point cloud. Threshold hei ght parameters have to be defined for the building detection. In the reconstr uction stage, linear features from Ikonos data are extracted and, with initial building poly gons from classified Lidar data, are used in an approach based on data and model driven techniques. The purpose is to fit building models, in the way of convex polygonal units, to the data. The method is based on constructing a binary space partitioning (BSP) tree, whose nodes represent building edges, to sequentially add polygon edges to initial building footprint estimates. A building detection strategy based on the fusion of multi-spectral and Lidar data is presented by Rottensteiner et al (2007). Using Dempster-Shafer th eory, DSMs from the first and last Lidar return along with a DTM extracte d from the previous DSMs and a pseudo NDVI image are processed for building detection. The NDVI index is obtained by fusing infrared Lidar intensity with RGB picture information. Hei ght differences between the DSM and the DTM, NDVI values and height differen ce between first and laser return and texture information are used as inputs to the classification process. 5.2.3 Lidar-Alone-Based methods Because of its inherent geometric informati on, using Lidar data alone has also been proposed as a method for constructing building mode ls and hence delineating building footprints. Most methods call attention to the sampling acq uisition method as a factor for the generally lower quality edge accuracy when compared to edges acquired with photogrammetric methods. Maas and Vosselman (1999) show a method ba sed on Hough transform and plane intersection working over raw data. In contrast gridded data are used by Alha rty and Bethel (2002), Morgan

PAGE 119

119 and Habib (2002), and Haithcoat et al. (2001), allowing one to use morphological methods and region growing algorithms for initial segmentation on DSMs. Assuming previous building point segmentati on, Maas and Vosselman (1999) propose a method for modeling that relies on intersection of building r oofs. By using a Hough transform, previously segmented raw Lidar data is proce ssed to detect building planes. Building ridges where plane intersections occur are automatica lly detected. Roof outlines are defined as polygonal shapes having edges aligned to the build ing orientation parallel to ridge directions. Instead of a least square adjustment of edges, the process takes into account that all the building points must be inside the roof outline. Using gridded Lidar data, Alha rty and Bethel (2002) show an algorithm to detect and reconstruct 3D features such as buildings. Statistical inference about fitting surfaces in small windows over data points are the basis of this wo rk. After segmentation, the modeling part is performed. Initial footprints ar e obtained by applying a threshold to a previous DSM. Then, from the raster data, boundaries are delineated. The bu ildings are constrained to have two dominant directions. Cross correlation matching of a templa te is used for that purpose. The modeling is completed assigning the mean height of the building as the height of a flat roof. Also based on gridded data, Morgan and Habib (2002) ini tially segment the data by a region-growing algorithm and classify them by a morphological method. A triangulated irregular network (TIN) is built using classified points. Fr om there, roof ridges are extracted by breakline identification. Lidar outlines are determined by a l east squares adjustment of centers of triangles that connect building and terrain points, enforcin g ,this way, the buildi ng points to be located inside the footprint.

PAGE 120

120 Haithcoat et al. (2001) deri ve a DSM from Lidar data and segment building and nonbuilding points based in height and gradient information at each point. Gradients are computed by convolution which allows for points on building footprints to be detected. A conventional raster-to GIS conversion and posterior refinement define final footpr ints. Fitting of models to the data finishes the reconstruction. Footprint delineation is usually shown as an a dditional result of a 3D modeling effort or as an intermediate step in the whole process. Ho wever, some studies have focused only on the delimitation problem. Using gridded data, Zhang et al. (2006) devise an automatic method to obtain the boundaries. In the work of Sampath and Shan (2007), using raw data, the boundaries are detected and extracted by segmenting, pr oximity clustering and modified convex-hull detection. Because these methods use Lidar data alone, in a similar fashion to our proposed technique, a more detailed review of th e methods is done in the next section. Zhang et al. (2006) delimit footprints afte r segmentation of gridded Lidar data is completed. Building recognition is done through a region growing algorithm where the growing parameter is how well a point and its neighborhood fit planar surfaces. From building areas, a 2D coarse footprint is extracted based on border pixels. The zi gzagged building segments are initially regularized using the D ouglas-Peucker algorithm, and then those segments are adjusted according to the dominant directions, generally two of the footprint. The do minant directions are found by minimizing a cost function when a rotation angle with respect to a coordinate frame is applied. The expression to minimize is a function of the length of the segments that make up the building and how close its angles are with respect to the rotation angle. Sampath and Shan (2007) use raw Lidar data in contrast to the previ ous method. A 4-step methodology to obtain a 2D footprint is shown. Th is tactic includes the extraction of building

PAGE 121

121 points, identification of points for each building, border point identification and final boundary regularization. The initial se gmentation in ground and non-gro und points is done by analyzing the slopes between consecutive points in a vertical profile. The profile is the one defined by the scan line. Then, points belonging to a same bu ilding are found by a region growing algorithm. That method assigns points to a building if they are inside a moving wi ndow centered at a point previously assigned to that bui lding. Using the building points a modified convex-hull algorithm creates an initial foot print. Instead of obtaining a convex polygon that wraps all the points, concavities are allowed when step s of a conventional convex hull algorithm are applied to small neighborhoods centered at each building point. Th e final step is the regularization of the footprint based on the finding of dominant directions. By tracki ng the border points and checking consecutive edge slopes, edge s with minor slope variation are grouped in segments. Those segments are then used in a restrictive least squa re adjustment to fit line segments having slopes oriented only in two principal directions. In addi tions, longer segments are given more weight in the adjustment solution. The method proposed here takes raw Lidar da ta and extracts 2D building boundaries. Segmentation of points as ground and non-ground poi nts, and then separation of building points from the non-ground segments is done previously By using the best data capture technique, including the configuration setting of the Lidar system, flight plan definition, and the return order identification of laser pulses, points on roof edges can be det ected and separated from points inside the footprints. After iden tification, a method based on least squares adjustment is done to get the footprint. A more detailed explanation of the method and its justifications is given below. 5.3 Method The scheme for obtaining building footprint delim itation, after the data is captured, is made up of two main steps: detection of building points, and extraction or modeling of the footprint.

PAGE 122

122 Many of the previous efforts using Lidar data, ra w or gridded, present a similar approach. Also, in most of them, footprint delimitation is an intermediate step in the process towards a complete 3D building modeling. Moreover, all of them guess, by inferring from segmentation information, the exact location and extent of the building f ootprint. In our case, however, because of the nature of Lidar data, the footpr int delimitation is done at the fina l step by directly using Lidar measurements on building edges. Reviewing other methods that work with Lida r data alone, we can perceive how different our method is from theirs. Regarding segmentation, all the reviewed methods first identify points on building roofs. Our method, in addition to that technique, identifies and separates buildings on roof edges from other building points. The premis e here is simple: Energy from laser pulses is fractionally reflected when electric wires, poles, birds, trees, brush, and roof edges are hit by the pulses (Slatton, et al., 2007). By appropriately segmenting and obtaini ng building points, and then selecting building points computed from a laser pulse with more than one return, points on building edges can effectively be segmented. Th e challenge comes when attempting to define a building point as also being on an edge. It has been assumed that the chance of a pulse hitting an edge is low. This assumption is basically supported by the low hor izontal point spacing obtained by commercial systems – about 1m. However, new findings using the latest syst ems show that the probability can be higher. Experiments carried out at the University of Fl orida have shown that point spacing ranging in scan line from 0.6m to 0.3 m can be achieved when pulse repetition frequencies ranging from 100 kHz to 166 kHz have been used. With these spacing distances, and with pulse footprints being about 0.30 m, ground, trees and building roofs can be almost entirely covered in those

PAGE 123

123 places under the scan line (see Figure 5-1). A dditionally, if four retu rns per pulse can be recovered, there is a higher proba bility of identifying edges. After segmentation, a model fitting based on least squares is performed over the recognized edge points, and using the inside-footprint poin ts if necessary. Before explaining in detail the processing steps, simulation of the scanning process to compute the probability of identifying edges and details about issues affecting the system setting are discussed. Also, the section gives an explana tion about expected accuracies and the level of automation. 5.3.1 Scanning Process Simulation To compute the probability of a laser pulse hitting building edges, we performed simulations of scanning proce sses over hypothetical edges. For these simulations, it is assumed that th e plane flies straight overhead towards north with unchangeable roll and pitch attitude zero values at adjustable flight speed and flight altitude. The system parameters that can be set are: lase r repetition rate, scan fr equency, maximum scan angle and beam divergence. Figure 5-1. High resolution loca tion of laser footprints on ro ofs. Higher pulse repetition frequencies increase edge hitting probability.

PAGE 124

124 The building edges are simulated as parallel lin es spread within the scanned area. The lines have an orientation measured in azimuth units, an d are separated at an adjustable distance. Figure 5-2 shows the graphical interface of the simulator The simulation is simplified to the case of tw o returns recorded by pulse, and, it is also assumed that at least 10% of the beam energy must hit the object to get a return, being that the object is a ground portion or a roof section. For simulation purposes, that means that to detect an edge, the intersection of the laser footprint with the edge must result on two footprint section areas, each one counting at least for 10% of the total footprint area. Figure 5-2. GUI for ALSM scanning simulator. Adjustable bl ue values correspond to the values to be set for the simulation. Red values co rrespond to the ones used for computing the probability of a laser puls e hitting a building edge.

PAGE 125

125 Once the system and edges are set and the simu lation run, the probability value of a laser pulse hitting an edge is computed using the geom etric intersections between edge lines and scan lines, assuming the scan lines are actual lines and not what they actually are: a series of laser pulses with some distance among the pulses. By c ounting the number of pulses hitting an edge and comparing them to the number of intersectio ns, a percentage of successful detection is computed. Assuming that the number of intersecti on is the maximum value that the number of pulse hits on edges can reach, the percentage can be interpreted as a probability value. Using the setting values in Figure 5-2, a simula tion for a 1 second flight time was run. For that simulation, 392 edge-pulse detected in tersections out of 2408 possible edge-scan line intersections were obtained resulting in a proba bility of 0.162 of a pulse hitting an edge. The graphical results are shown in Figure 5-3 (top). By changing the laser repetition rate to 100 Khz, the scan frequency to 40 Hz and the edge orient ation to 45, the probabili ty rise to 0.491 (1214 detected intersection out of 2472). The graphical result for this si mulation is shown in Figure 5-3 (bottom). Looking at the bottom figure, we realiz e that the average distance between detected intersections on an edge is much smaller than th e average distance in the first simulation. This last fact is also a result of the point density and it suggests that the number of detected intersections per linear meter al ong the edge is a quality factor for final footprint delineation. As expected, probability values depend on system and flight configuration. That suggests that by trying different configur ation settings, and as long as those configurations result in spacing distances similar along and across the fli ght path, good detection probabilities can be obtained. Table 5-1 shows probability values computed for several laser repetition rates and scan frequencies. The remaining parameters ar e set as the ones shown in Figure 5-2.

PAGE 126

126 Figure 5-3. Detected pulse-edge intersections for two simulati on runs. Green lines correspond to simulated edges. Red dots correspond to lase r pulses. Black dots correspond to laser pulses intersecting edges. Only a 200 mete r portion out of the 400 m swath width is shown. (Top) Results for first simulati on run. (Bottom) Results for the second simulation run. 5.3.2 System Setting According to the simulations done before, ther e are system configurations that will allow us to capture more edge hits th an other configurations. At first sight, it seems that at least two perpendicular flight directions should be carried out to obtain returns fr om edges. A direction oblique to the main directions of the building blocks, if they exist, could be attempted. One of the things we must be aware of is that at big scan angles, the location of the linear edge in the building with respect to the scanning system influences the ability of the system to detect the edge. When buildings are located at a large scan a ngle with respect to the system and the main direction of the buildings is ali gned along the flight direction, (s ee Figure 5-4), pulses hitting the

PAGE 127

127 edges that are further from the system will pr obably have a second return from the ground or other structures. On the other hand, pulses hitti ng closer edges will have a fraction of the pulse hitting the side walls. Due to the dead time the system has, the returns from walls won’t be recorded most of the time, and the ability to reco gnize the edge in that pulse vanishes. (Or more likely, the wall returns will be mi xed with the roof edge return and yield an error in the range measurement.) To detect edges in any position of the building, this area c ould be flown using two parallel flight lines separated half swath width (see Figu re 5-4). Using these flight lines, most building edges would be scanned from the best configuration possible for recording a second return from one or the other of the flight lines. Figure 5-4. Building scanned by pa rallel flight lines. The system mounted on the airplanes will record a second return from further building edges (pulse directi on in black lines) but probably won’t record any second return from the closest edges (pulse direction in red lines). By combining parallel flight lin es, most pulses hitting edges could record a second return.

PAGE 128

128 Table 5-1. Probability of a laser pulse hitting a edge for different ASLM system configurations Laser repetition rate [KHz] Scan frequency [Hz] Probability Pulse-edge Intersections Scan lineedge Intersections 33 280.162392 2408 70 280.348840 2408 100 280.5221257 2408 125 280.6341528 2408 143 280.7211737 2408 166 280.8472040 2408 33 400.104360 3440 70 400.3021040 3440 100 400.3491201 3440 125 400.4531560 3440 143 400.5812001 3440 166 400.5932040 3440 5.3.3 Expected Footprint Accuracy Given the fact that returns from edges can be detected, a more accurate fitting of a model can be done. As seen in previous works, building footprints have been delineated only from the knowledge that points are on buildin g roofs. When raw data has b een used, the basic assumption was that the most exterior building points must be on the footprint or help to define the footprint, and a least squares type of processing was done to fit linear segments. Because of the high uncertainty about whether the points were on the edge, location accuracy was compromised. When gridded data are used, it is easier to get the boundary points but not without further degrading the location accuracy of them. Because in our approach edge positions are not guessed but directly measured, higher accuracies on the loca tion of building footprin ts are expected even compared to the ones obtained from photogrammetry. These accuracy results can be expected from the outcomes shown by Shrestha et al. (1999). This study shows that, when compared to he ights extracted from profiles computed by conventional surveying, orthometric heights cal culated from a Lidar-based DTM present RMS

PAGE 129

129 errors between 6 and 10 cm. Also, comparison of photogrammetric and Lidar methods to obtain heights is done in that study. By comparing hei ghts from DTMs derived from the use of the two technologies, at the planimetric places where m easurements coincide, a difference in heights ranging from 2 to 7 cm with standard deviati on between 6 and 8 cm could be reached. This shows that with appropriate methods, accuracy obt ained from Lidar is as good as that obtained from photogrammetric methods. Although those results are promising, we still need to define how we measure accuracy. The accuracy that we are interested in calculati ng is the horizontal location accuracy of the footprints. This is not always mentioned in the previous works done on footprint delimitation. In earlier photogrammetric papers reviewed for this work, no mention is made about location accuracy because they are focused on automation of the solution. Also, it is not done because accuracy depends on the quality of the photographs (spatial resolution, pair overlapping, etc.). Another reason is that as long as the automa tic process does everything as a manual operator would, the obtained accuracy would have to be at the same level as the accuracy normally obtained by manual methods meeting some standards. According to this, the accuracy that these works show is done by visually comparing the footprints obtained in each case, with a background picture, and some times verifyi ng if all the footprints were detected. Some newer papers do show accuracy results, however. For example, one of them compares the building footprin t nodes to the manual measures in a photogrametric station (Samadzadegan et al., 2005). Comparison of 3D points extracted in a photogrametric station have also been used as ground truth in some papers (McIntosh and Krupnik, 2002; Haithcoat et al., 2001).

PAGE 130

130 An alternative approach compares the f ootprints pixel by pixe l with ground truth footprints, and several values from the false positives and negatives and true negative and positives are computed (Sohn and Dowman, 2007). Two papers related to footprint delimitation (Zhang et al., 2006; Sampath and Shan, 2007) assess accuracy by visually compar ing footprints and orthophotos and footprints digitized from the pictures. Also, they assess quantitative accuracy by computing commission/omission area errors, in a way similar to Sohn and Dowman (2 007), by comparing the foot prints with the ones digitized from pictures. The work proposed here will be focused on the spatial accuracy of f ootprint features. To compare accuracy from photogrammetric-based a nd Lidar-based footprint delimitation methods, an explanation on expected values is given below. Photogrammetric accuracy. The accuracy of maps obtained from photogrammetry must comply with map accuracy standards. Common standards are the ones defined by the American Society for Photogra mmetry and Remote Sensing (ASPRS), and the National Map Accuracy Standard s (NMAS) defined by the U.S. Bureau of the Budget in 1941 (Falkner, 1995). ASPRS’ standard s conceive a three-class accuracy system, in which Class 1 is the strictest, Class 2 allows errors twice as large as Class 1, and Class 3 errors are at most three times as large as Class 1. The NMAS standards are very similar and encompassed by class one and two errors in ASPRS’ standards. The horizontal accuracy in the ASPRS’ standard s is measured in feet or meters at ground scale. The accuracy is defined in terms of the RMSE (root mean square error) and for each class the limiting RMSE allowed error defines the accuracy class. In order for maps to be concordant with accuracy requirements, a high percentage of the difference between the map point

PAGE 131

131 measurements and a higher accuracy survey of the points on the ground must be less than the RMSE value. The accuracy is linearly related to the map scale (target scale at which the map is drawn) of the final planimetric or topogr aphic product (U.S. Army Corps of Engineers Staff, 1995), and decreases as the map scale increases. Because of the fact that increasing accuracy means a decrease in the limiting RMSE va lue, there exists a direct lin ear relationship between the map scale and the limiting RMSE value. The limiting va lues are given in tabular form for specific scales; however a relationship expression be tween these values can be extracted. The target map scale can be defined as the distance in feet measured on ground of any feature measuring one inch in the map. If t is the target map scale the limiting RMSE (lRMSE) value for all the accuracy classes are 100 ct lRMSE (5-1) where c is the accuracy class. For Class 1 accuracy and a target scale of 50 feet, the limiting value is 0.5 feet. Target scales are the result of an enlargemen t of the photographic (neg ative) scales of the set of pictures used to make the map. The relation between the negative scale ( n ) and t depends on the accuracy class and is given by e n t (5-2) where e is the enlargement factor. For accuracy Classes 1, 2 and 3, the e values are 7, 8 and 9 respectively when analytical stereo plotters are used.

PAGE 132

132 The target and negative scale can also be expr essed as the ratio of the measurement of a feature in the map to the feature measuremen t on the ground. Because 1 foot is equal to 12 inches, the relation between the ratio ( r ) and the negative scale is given by n r 12 1 (5-3) Replacing (5.3) in (5.2) and then that result in (5.1) we have er c lRMSE 1200 (5-4) The photographic scale in ratio form can be expr essed as the ratio of the focal length of the camera ( f ) to the height above ground ( h ) the pictures were taken. Re placing this fact in (5.4) we have h ef c lRMSE 1200 (5-5) The expression above shows that the limiting RMSE is linearly direct related to the flight height; therefore, as the flight height increases the RMSE value increases and the accuracy of the resulting map decreases. Figure 5-5 shows the relationship between RMSE values and flight heights for a mission using a 152mm-focal le ngth camera. The fi gure shows the linear relationships for the three accuracy classes when an analytical stereo plotter is used for processing. Lidar-based accuracy. The accuracy of footprints de tected by Lidar systems is tied to the horizontal positional accuracy of th e laser returns provided by the system. In addition to this value, the diameter of the laser footprint interacting with the target adds uncertainty to the return location accuracy.

PAGE 133

133 Figure 5-5. Relationship between height of the flight in a photogrammetric mission and the accuracy of the maps extracted from th e photographic processing. Values follow ASPRS accuracy standards. The focal length of the camera is 152 mm. The footprint diameter at the ta rget is a function of the flight altitude and beam divergence of the system and is given by (Baltsavias, 1999): 2cos h fD (5-6) where fDis the footprint diameter, h is the flight altitude, is the beam divergence, and is the instantaneous scan angle. Equation 5.6 assume s that the terrain is fl at, and it refers to the circle diameter when is zero and to the semi major axis di ameter of the obtained ellipse when is larger than zero. Because the return location is assumed to be at the middle of the footprint, the amount of uncertainty added is about a half of fD In order for the sensor to detect a return, a minimum fraction of the laser energy must hit the target. Assuming a circular-shaped footprint a nd the fact that at least a percentage of the energy must hit the target, either for any one of the returns gene rated by the pulse, footprints of

PAGE 134

134 returns hitting edges will have smaller uncertainty due to the fact that the center of the footprint will be closer to the edge (see Figure 5-6). Assuming that at least 10% of the pulse energy must hit any target to sense a return, any building edge must be hit by either at least 10% or at most 90% of the pulse energy. According to this, the percentage areas of the two circular segments in Figure 5-6 (the larger and the smaller ones) must be values between 10% and 90%. Give n that the expression to compute the smaller segment area in the figure is computed by: 2 1 22 ) ( cos h Rh h R R h R R A (5-7) where R is the footprint radius, and h and d are the fraction of the radius seen in Figure 5-6, the value of h that yields an area A of 10% of the total area is about 0.3 R According to this, the uncertainty on the footprint lo cations would be a half of R+d or a half of 0.5 fD +0.2 fD According to this, the accuracy of a single return location detected with the minimum fraction energy is defined by 2 7 0 fD sA fA (5-8) where fA is the footprint accuracy, sA the system accuracy and fD the footprint diameter. Replacing 5.6 in 5.8 we have the final e xpression for computing this accuracy: 2cos 2 7 0 h sA fA (5-9) For the Gemini system described in Chapter 2, the beam divergence can be adjusted to 0.0003 or 0.0008 mrad. The system accuracy ( sA ), if calibration of the Lidar capture system is appropriately done, is proportiona l to the flight altitude ( h ) and given by: 5500 h sA (5-10)

PAGE 135

135 Figure 5-6. Location of a laser foot print with respect a target edge. R is the footprint radius, and h and d are fractions of R When added, h and d lengths equals R length. Figure 5-7 shows the relationship between the accuracy of return location and the flight altitude for this system. Also, the accuracy cu rves for features extr acted by photogrammetry are shown. Accuracies obtained at narrow beam di vergence should be better than those from photogrammetric Class 2 and 3 and slightly larg er than those from Class 1 photogrammetric processing. At 600 m altitude, the ac curacies difference is just one tenth of a foot. For wide beam divergence, the accuracies are not as good as th e narrow-beam accuracies but better than for photogrammetric Class 3. Although the best photogram metric class accuracies are bett er than the best obtained from Lidar, they measure slightly di fferent phenomena. In the photograp hic case, the accuracies refer to the extracted features (building edges, point lo cations). In the lidar case, the accuracies refer to return locations. Because the fina l building footprints are extracte d through a least squares based process, the accuracy of the foot print corners, computed by usi ng returns on building edges, are expected to have better accuracy. In a least squares adjustment, the accuracy of the unknown adjusted values is inversely proportional to the square root of the difference between the nu mber of observations and the number of unknowns. Therefore, the accuracy of the adjusted corners will improve as the number of available observations for computing that corner increases.

PAGE 136

136 Figure 5-7. Relationship between height of th e flight of a Lidar mission and the expected accuracy of the laser returns. An angle scan of zero degrees is assumed. Red values obtained when a 0.0003 mrad beam divergence is set. Green values obtained when a 0.0008 mrad beam divergence is set. Blue values shown for comparison correspond to photogrammetric-based accuracy values. Values follow ASPRS accuracy standards. According to this, if many observations ar e available for any corner coordinates computation, the accuracy of the detect ed corner (cA) can be expressed by 1 m fA cA (5-11) where fA is the accuracy of a single observati on, represented by a laser return and m is the number of observations. When Lidar data is cap tured, the number of pulse returns on edges will depend on the system setting and the flight plan. Because a single expression to relate accuracy values to setting configurations is difficult to obtain, we can use Equation 5-11 to assess expected accuracy of building features. Figure 58 shows the accuracy curves of a single corner for different number of available observations. The accuracies in Figure 5-8 show that if th ere are ideal capture and calibration conditions, accuracy values at photogrammetric levels can be obtained.

PAGE 137

137 Figure 5-8. Relationship between height of th e flight of a Lidar mission and the expected accuracy of the footprint corner coordina tes. An angle scan of zero degrees is assumed. Red values obtained when a 0. 0003 mrad beam divergence is set. Blue values shown for comparison correspond to photogrammetric-based accuracy values. Values follow ASPRS accuracy standards. 5.3.4 Automation The method proposed here is an automatic one as some of the only Lidar-based methods. Also, methods based on photogrammetry claim co mplete automation. As for photogrammetry, automatic derivation of DSM is a task hard to achieve, at least with results fulfilling manual method standards. Gruen et al. (2000) show when testing the pe rformance of digital photogrammetric stations (DIPS), that automatical ly computed DTMs present RMS error values larger than 0.7 m if compared to a manuallyobtained DSM. Those tests were done using three different DIPS over three datasets. Also, when averaging height differences, always positive values were obtained, which suggests that those systems overestimate height values. A more recent attempt to provide automati on, when processing aerial photographs, is shown by Zebedin et al. (2006). Most of the step s in the study are based on several computer vision and image processing algori thms. The first step involves classification of RGB and color

PAGE 138

138 infrared pictures through a supe rvised method based on support ve ctor machines (SVM). Classes describing man-made objects, vegetation, gr ound and water are specified. Then aerial triangulation, with several assumptions, is done using found matching poin ts and control points to adjust the data block. After that, a dense point matching is done to obtain a high resolution DSM. Ortho images and the DSM are fused to fina lly classify the data and obtain building areas. Although no building footprint delin eation is done, it is clear that could be obtained from the building areas. As can be observed from those methods, possi ble sources for impeding automation are the large number of different algorithms and parameter configurations needed for every step in the process. For example, in a very critical step, the matching process, texture, relief, vegetation coverage and scale are a factor in how and when to reject or accept th e matching pairs. These factors affect the way the matrix size for correlation and its thresholds are defined. One of the characteristics of our method is that the segmentation and modeling can be done in local neighborhoods based on previously acquire d statistical knowledge. Because of this, and the fact that the method works over raw data, the automation can eventually be applied to real time processing. Below, a more detailed description of the method is provided. 5.4 Method Steps As previously described, our method has two preliminary steps and two main steps. The preliminary steps are the setting of the system for data capture and the preprocessing of those data. Then, the two main processing steps are cl assification and footprin t modeling. The system setting has to do with the defini tion of the system parameters a nd flight plan. Preprocessing is just data formatting to include return informa tion per pulse. Classification looks for detecting building points on both inside footprints and edge boundaries. The modeling is the final part

PAGE 139

139 where fitting of the foot print is done to the segmented data. Below, more details are given about each step. 5.4.1 System Setting and Data Capture The setting of the Lidar system and the flight plan were chosen according to what was discussed in Sections 5.31 and 5.3.2. To test the footprint modeling method, ALSM observations covering a fraction of the Univer sity of Florida campus contai ning trees and buildings were carried out. The building set on campus ranges fr om simple building roofs to very complex constructions containing infrastr ucture on top of them. The data were collected in April 2008, using the Optech Gemini System described in Ch apter 2. The initial re duction and processing of the ALSM observations followed standard pr ocedures developed by UF researchers. As mentioned in Chapter 3 the campus buildings show a wide variety of heights, forms, and rooftop geometries along with many differe nt tree types next to the buildings. These characteristics make building segmentation and the regularization of building footprints difficult. The footprint processing is particularly affected by the irre gular geometry of some buildings. Analyzing the simulation results shown befo re, a system setup looking for the most possible regular point spacing, 50% overlap scanning and about 5 re turns per meter point density was established. The system parameters were se t as follows: Laser repetition rate was 125 kHz, maximum scan angle was 18 degrees maximum s can and the scan frequency was 40 Hz. The beam divergence angle was set to 0.003 mrad (na rrow angle). According to simulations, a wider angle would yield larger probability values of hitting building edges, but the accuracy values would be about twice as worse as the values fr om narrow beam angle. Because one of the goals of this work is obtaining good accuracy values, th e probability of hitting edges was increased by a flight plan including more scan lines.

PAGE 140

140 Regarding the flight plan, because of the considerations made before about the flight directions and the fact that the majority of building edges are a ligned to the north-south and eastwest directions, oblique flight northeast-southwest and northwe st-southeast directions were flown. That results in parallel and perpendicular scan lines, allowing us to cover each portion of the area to process four times. Compared to regular Lidar missions, the number of proposed scan lines is twice as large. To process a 1050x1750 meter rectangular area, 20 twenty scan lines were captured with the system at an average height of 500 m, resu lting in swath widths ranging from 350 to 400 m. Figure 5-9 shows the analyzed area and some of the scan lines. For the following sections including preprocessing, classifi cation and modeling, data from only one of the scan lines or from a small portion of the scanned area wi ll be used for illustration purposes. 5.4.2 Pre-Processing To include information extracted from the return number related to any point, the scan line data must be preprocessed. For each one of the 20 scan lines the same process is carried out. Figure 5-10 shows a portion of one scan line captured over some buildings surrounding the football stadium at the University of Florida. The Gemini system, used to capture the data and recently acquired by the University of Florida, has the capacity to record up to 4 returns per pulse. One file per re turn type is recorded. If the system records a fourth return, it means th at three previous returns for that pulse were recorded. For subsequent discussions, the type of return (from 1st to 4th) associated with a recorded point will be called the return numb er, and point will refer to the three spatial coordinates of the return.

PAGE 141

141 Figure 5-9. ALSM Scan Line boundar ies of the data taken at the University of Florida Campus. The white boundaries corres pond to two of the twenty scan lines flown over the Campus. The red boundary corresponds to th e rectangular area wh ere the footprint modeling method was applied. The x and y axes are measured in UTM (zone 17) meters.

PAGE 142

142 Figure 5-10. Lidar points color coded by elevatio n. Buildings besides th e football stadium are shown. The stadium is located at the upper le ft portion of the phot ograph in Fig. 5-9. Knowing the return number is important, but for building edge detection, knowing how many additional returns were reco rded after each point for the same pulse is more valuable. Thus, for each pulse getting a first return, there are only four possibilities: the pulse has zero, one, two or three additional returns. For the second return obtained from a pulse hit, there will be zero, one or two additional returns. For the th ird return there will be zero or one additional return, and for a fourth return there is no chance that an additional return will be recorded. We call this number, the number of additional returns or return-under number. The illustration in Figure 5-11 shows an example of how these numbers are computed. Figures 5-12 shows Lidar point s classified by the return num ber. Figures 5-13 and 5-14 show Lidar points classified by the return-und er number. By observing Figures 5-12 and 5-14, we actually can see building shapes delineated by the classified points. However, because of the scanning mechanism, the points (second, third and fourth returns) shapi ng buildings in Figure 512, unlike the points in Figure 514 (points with additional return s recorded after them), are actually ground locations around the buildings.

PAGE 143

143 Figure 5-11. Laser pulse hitti ng different surfaces. The black lines correspond to lines representing the trajectory of laser puls es. The blue and red numbers correspond to the return number and return-under number related to each recorded point. For the pulse hitting the building roof, only a first return is recorded (blue number 1) and no additional returns are recorded (red numb er 0). For the pulse hitting the building edge, the first return (blue number 1) has a return-under number of 1, because of the posterior return on ground hitting the ground. For the second return (blue number 2) there is no additional returns recorder after it (red number 0). The same logic applies for the laser pulse obtaining th ree returns from the tree. To obtain the return-under value for each point, the four return type files are blended and sorted in descending order by time tag. This has to be done for the whole survey data because the only way we make sure that the points are sequ entially ordered is according to how they were captured. If a smaller region were defined for processing, any one of the secondary returns (second, third and fourth) could ev entually be lost if they hit the landscape outside the defined extent and the first one is inside this region. Once the complete file is sorted, the processing is carried out and a resulting file containing the three dimensional c oordinates, intensity, return number, and return-under number is obtained. ( 1 0 ) ( 1 1 ) ( 2 0 ) ( 1 2 ) ( 2 1 ) ( 3 0 )

PAGE 144

144 Figure 5-12. Lidar points cl assified by return number (Classes 2 and 3). Orange points belong to Class 2, and green points belong to Class 3. No points in Class 1 and 4 are shown to see building details. The class numbers refe r to the return numbe r of each point. All the points shown here are the result of pulse s hitting soft areas such as tree coverage or building edges. Third and fourth return s are found mostly in forest areas. Second returns cover more area a nd are indistinctly on build ings and trees. The second returns enclosing the buildings are actually the result of fraction of pulses hitting ground or other structures after the main part of the pulse, recorded as a first return, hit the actual edges. This is the reason why we need to proc ess the return-under number instead for accurate building delineation. The last preprocessing step has to do with the existence of some misleading points among the set of points with return-unde r number greater than zero. Because in some cases scan lines may be almost parallel to building edges, e dges detected by using poi nts on edges that are consecutive in the scan line can distort the actual edge. This is because the edge would become the scan line (see Figure 5-15). Preprocessing marks these points out so they can be recognized in the modeling stage. Because no returns can be detected from those particular edges, the perpendicular scan lines included in the capture allow the detection of returns on those edges.

PAGE 145

145 Figure 5-13. Lidar points classified by return-under number (all classes.) Dark grey points belong to Class 0. Light gr ey points belong to Class 1. Orange points correspond to Class 2. Green points belong to class 3. Th e class numbers refer to the return-under number of each point. The points with return -under number 0 are mostly first return hitting hard surfaces, and for this reason no additional returns are recorded; however, they also are second, third or fourth retu rns that do not have any additional return recorded after them, as the ones seen on trees Points with return-under number of 1, 2 and 3 are returns that have additional retu rns after them. Most of the points in class 2 delineate the building by giving the actua l location where the laser pulse hit the edge. We can assume that because after hi tting the edge, one, two or three additional returns were recorded. Some of the points in class 2 and 3 also hit some edges, so they have to be counted for posterior processing. Figure 5-14. Lidar points classified by return-under number (all classes but 1.) All classes but Class 1 in Figure 5-13 are shown.

PAGE 146

146 Figure 5-15. Misleading returns. Re turns on building edges, describe d by the circular footprints. The returns can define errone ous edges with the scan line direction rather than the building edge direction. 5.4.3 Classification The two main steps for printing delineati on are classification and modeling. In the classification part, preprocessed data is analy zed and labeled as ground or non-ground data. For the non-ground data, further classi fication into building/non-buildi ng and edge/non-edge data is assigned. In the modeling step, th e previous output is converted in building footprints. The flow chart in Figure 5-16 gives more information about the process. In the classification part, nonground points are separated firs t from the raw data. Then building points are separated from vegetation poi nts and points on other st ructures. After that, using the return-under attribute for each poi nt, points on building edges are detected. Ground/non-ground classification. For segmenting non-ground points from ground data, a multi-scale adaptive filter was used (Kampa and Slatton, 2004). The filter is adaptive in the sense that the thresholds needed for classification are defined from actual training data. By iteratively applying a sliding window of decreasing size and usi ng training data, points are assigned to the ground class by comparing its height to the ot her point height in the window. An EM algorithm is then used to statistically characterize probability distribution functions for vegetation classes.

PAGE 147

147 Figure 5-16. Classification and mo deling steps. In the classifica tion part, classification tasks separates non-ground points from ground poi nts over the preprocessed data, building from non-building points over the non-gr ound data, and on-edge from non-edge points over the building data. In the modeli ng part, a majority filter is done for refining previous classifications, linear segm ents of the detected buildings footprints are found, and final footprint fitting usi ng the previous clues is carried out. Because the goal is to obtain non-ground poi nts rather than ground points, an algorithm parameter can be tuned to make sure that all po ints on buildings are selected. The risk is that some points on low-height bushes can be accepte d in that class. Results from ground/non-ground classification carried over the scan line depicted in Figure 5-10 are show n in Figure 5-17. Figure 5-18 shows non-ground points classifi ed by return-under attribute.

PAGE 148

148 Figure 5-17. Results from classi fication. Orange points correspond to ground class. Green points are in the non-ground class. Figure 5-18. Non-ground points clas sified by return-under number. Dark grey corresponds to class 0. Light grey corres ponds to class 1. Orange co rresponds to class 2. Green corresponds to class 3. Class 0 points (return s) are the laser hits having no additional returns for that pulse after those returns were recorded. Classes 1, 2 and 3 correspond to laser hits having 1, 2 and 3 additional returns after those returns were recorded respectively.

PAGE 149

149 As discussed in 5.3.2 (system setting), looking at Figure 5-18, scanning points at large scan angles makes it more difficult to detect the edges. That happens at building edges located further from the scanning system. At closer edges, s econdary returns would h it the wall of the same building, whereas at further e dges, they would hit other stru ctures. Overlapping scans could allow for the system to obtain secondary returns from either one of the scan lines. Also, ground/non-ground classification for so me complex buildings turns out to be difficult. Non-ground points are sometimes return ed as ground-points. However, because the modeling part takes into account return-under information, points incorrectly misclassified can be detected and filtered out as it wi ll be explained in posterior sections. Building/non-building and ed ge/non-edge classification. Once the non-ground points have been obtained, furt her classification on building a nd non-building points has to be carried out. Building points, in this case, are the points on building roofs. A requirement the classification method must fulfill is that results have to be given in the form of data points instead of pixel cells, as it is the case of classification methods working with gridded data. For doing this classification, a method base d on supervised learni ng and explained in Chapter 3 is used (Caceres et al., 2007; C aceres and Slatton, 2007). The method acquires knowledge from training data statistics. Spin images and surface fitting to a local neighborhood are used for object (Lidar point) feature extraction (Johnson, 1997). Because of the high dimensionality of the feature space, the k –nearest neighbor rule is used to provide a computationally efficient decision scheme based on class likelihoods. One of the key aspects of the methods is that, at classification time, the feature computation for each point needs data from a small local neighborhood, and no knowledge of the total information point cloud is ne cessary. That characteristic is helpful if almost real time

PAGE 150

150 classification has to be done. A more detailed discussion on the algorithm details are found in the work of Caceres et al. (2007). Because training is needed, five areas are us ed, 2 for building classes and 3 for vegetation classes. These areas are shown in Figure 5-19. This training data is used to process all the data captured in the twenty scan lines and inside the 1050x1750 meter area shown enclosed in the red rectangle in Figure 5.9. After tr aining, the classification process yields non-ground points in two classes: building and no n-building classes. In addition to its spatial coordinates and building classification (ground, non-ground, non-ground/bu ilding, non-ground/non-building), each point is associated to the return-under attribute and th e slope and aspect of th e plane fitting its local neighborhood. This segmentation approach overcomes some of the difficulties faced when separation of Lidar returns from building and vegetation cla sses is sought. This discrimination, usually a difficult task due to the close pr oximity of tress to bu ildings and house roofs, is effectively done by the use of spin images as a three dimensi onal feature descriptor of each point neighborhood (Caceres et al., 2007). Figure 5-19. Training areas for build ing classification. Polygons at left correspond to the training areas for building roofs and trees. Rectangul ar areas are buildi ng training areas. The depicted area at left is an augmented version o f the data enclosed by a black rectangle at right.

PAGE 151

151 The algorithm analyzes each point and its three dimensional local neighborhood and classifies points on building wa lls or building roofs under vegeta tion. Because DSMs in gridded from present spatial information in 2.5 dimens ions, it is not possible in most classification algorithms to perform actual three-dimensional analysis. Because classified results are not 100% accu rate, a majority filte r looks to correctly classify misclassified points. For each point, th e filter searches the majority class among the neighbors and assigns that cla ss to the point. The neighborhood is adaptively defined by a cylinder whose axis is perpendicu lar to the local tangent plane at the point. The adaptability of the algorithm comes from its capacity to adjust to local slope orientati on. The circle radius depends on the point density of the data. This or ientation is defined using the aspect and slope values at each point computed at the building classification step. Results of this processing are shown in Figure 5-20. In addition to the building cla ss, each point has a return-under attribute. When used with building classification, that attribute allows for identification of points on building edges. Points whose return-under attributes are greater than 0 are very likely to be on a building footprint. Figure 5-21 shows building points cl assified by return-under class. Figure 5-20. Segmentation after ma jority filter. Green points corre spond to vegetation and other structures class. Red points corr espond to buildin g roof class.

PAGE 152

152 Figure 5-21. Building roof points classified by return-under number. A) Dark grey corresponds to class 0. Light grey corresponds to class 1, 2 and 3. The classified data correspond to the data that was previously classified as building points and shown in Figure 5-18. B). Only returns that have returns under are shown. Light grey corresponds to class 1, 2 and 3. After classification is done, a fitting of a model is made over the data. Below, the fitting scheme is explained. 5.4.4 Modeling Before a footprint can be fit to the buildi ngs, a point cloud for each building has to be obtained. Then, segmentation of points making up building edges is carrie d out. Finally, linear features are adjusted to each segment resulting in a quadrangu lar shape enclosing the building point cloud (see Figure 5-12).

PAGE 153

153 Individual building segmentation by appl ying local adaptive majority filters. In labeling and classifying the data in individu al building point clouds, the idea is to assign a point to a building cloud if the poi nt is close enough to at least one of the points already assigned to the building. In addition, the algorithm must consider the existence of complex building roofs made up of several planar faces. As previously done in the majority filter, dist ances are computed in a local reference frame whose orientation is determined by the slope of th e tangent plane at each point. This reasoning is the basic idea behind most region growing algorithm s, but for this case a graph-based algorithm is used to reach the same goal. A data graph is created where the nodes repr esent data points. The links joining two nodes indicate that the distance betw een the two points is less than so me defined threshold. The graph construction is intended only for query purposes a nd no refining of the graph is done after. Once the data is constructed, the classification is done by performing a br eadth first search ( bfs ) for each data point. The outcome of a bfs is the set of points that can be r eached from the initial data point by traveling the graph links. The point s reached from that initial point can be assumed to belong to the same building, and in fact, the query produces the same results in a bfs with the initial point being anyone of the points belonging to the buildi ng. By eliminating from the search the points already assigned to buildings and applying the sa me process to all remaining data points, a classification of individual buildings is achieved. The graph implementation and the bfs query over the structure were implemented in Matlab by using a coded Matlab library (Gleich, 2007). Applying this algorithm, a first segmentation of points groups all th e points belonging to the same building roof part by doing the search in 3D space. The minimum distance threshold is

PAGE 154

154 set to twice the accuracy value expected from Lidar single poi nts (see Section For simple building roofs, they may be made up of a single roof part whereas for complex ones more than one part may exist. Figures 5-22 and 5-23 show the segmentation in building parts of some of the classified points show n in Figures 5-20 and 5-21. A second segmentation groups building roof pa rts in a unique build ing roof by doing the search in 2D space. For this case, the minimu m distance threshold is set to four times the expected accuracy value of singl e laser returns. For simple building roofs, this results in a building group composed of single roof parts. Fo r complex building roofs, the roof parts will be fused into a single set. Figure 5-24 shows the se gmentation in building ro ofs of some of the classified points shown in Figures 5-20 and 5-21. For footprint delimitation purposes, even t hough the points on complex buildings are fused, the information related to which part of th e roof each point belongs is saved. According to this, after majority filters are applied, three attributes for each one of the building points processed are now available: the on-edge attribute, defined by the return under number, assumes the points with return-under number greater than zero are very likely to represent returns hitting building roof edges; the points with return-under number equal to zero are very likely to be on the roofs but not the edges. The building part identifier assigns the point to a building part. Finally, the building identifier assi gns the point to a building roof. From the set of building points with these at tributes, linear segments are extracted through a segmentation process explained below. The segmentation is done for points on the building parts. For the final fitt ing, the fact that building parts can be grouped as a complex building roof is used.

PAGE 155

155 Figure 5-22. Building points colo r coded by building part identifie rs (top view.) Building areas with the same color bur separated by othe r building part are actually different building parts. Figure 5-23. Building points co lor coded by building part id entifiers (oblique view.) Figure 5-24. Building points colo r coded by buildi ng identifiers.

PAGE 156

156 Edge segmentation by point clusteri ng and return-under attribute knowledge. Once building point clouds are obtained for each building part, a segmentation of linear edge features is obtained from the Lidar building po ints located on edges (those whose return-under number is greater than zero). The final result of this process is a set of points classified by the footprint edge on which each point is on. Besides the rectilinear edges most building show, the fact that the points in the point cloud have an accuracy dependent on the scanning system is used for edge segmentation. Those inputs are used in a clustering scheme that detects point clusters with an elongated (linear) shape. The idea of detecting linear edge s before footprint fitting comes as a natural step, and it has been used before. Zhang et al. (2006) use th e Douglas-Peucker algorithm for generalizing an initial footprint (Douglas and Peucker, 1973). The algorithm ge ts rid of unnecessary border points and leaves the essential points that preser ve the footprint shape by comparing a threshold parameter to point distances to a main direction. Points between the essential points are assumed to be on the same edge. Sampath and Shan (2007) use a simpler approach by following the boundary points and creating new edge segments wh en abrupt changes in directions are found. Defining an abrupt change depends on a user specified threshold. These approaches are empirical and depend on cell size or user knowledge. On the other hand, our approach relies on unsupervised learning (c lustering) where the clustering criterion is based on the accuracy of th e Lidar points. The clusters are created as the points are processed sequentially. Although the criter ion is based on accuracy values that can be seen as thresholds, these values are strongly tied to the way the data is captured and the sensor system used for this capture, so they do not depe nd on user decision. The clusters are rectilinear segments, and the criterion for deciding which cons idered point is assigned to a cluster is its

PAGE 157

157 closeness to a cluster representative (Theodor idis and Koutroumbas, 2003). In our case, the representative is the line that best fits the points in the cluster in an orthogonal least squares sense. Our clustering method compares the minimum di stance between the point to be clustered and the cluster representative. Since the data points have an accuracy value associated, it is expected that any considered point located in a rectilinear edge must fall inside an accuracy buffer around the line to be considered as a point on that edge. If the gi ven horizontal accuracy of the data is we can assume that any point whose distance is less than belongs to that line. In addition, we need to keep in mind that the s canning system records a retu rn if any part of the pulse footprint hits the edge. Becau se the footprint has a diameter ( fD ), an uncertainty about how far the center of the pulse footprin t is from the edge is added to This uncertainty is not larger than half fD which is what makes the decision threshold dT as large as + fD/2 Because fD depends on the average flight height, the parameter can be computed if that height is available. Although some points do not fall inside any buffer zone and are not assigned to any current cluster representative, the procedure may not be a clustering in the strictes t sense; however, the procedure fits the clustering definition if t hose points are considered filtered outliers. The criterion used ensures that linear compact cl usters are built, but it does not keep from having coincident points in that cluster even though those points are not on the same edge. In Figure 5-25, one building part show s this kind of cluster. For that reason, a coarse initial edge definition is carried out, which allo ws us to apply the clustering process to smaller point sets. To obtain this initial coarse partiti on, the order of the points in th e initial on-edge point set is needed. According to this, the pr ocedure to obtain edge segments encompasses definition of the

PAGE 158

158 order of the initial on-edge points, initial co arse segmentation, and final clustering-based segmentation. Below a description of these steps is given. Order point definition. Contrary to the approaches by Zhang et al. (2006) and Sampath and Shan (2007), where detection implies that point are polygon verti ces with some order location within a polygon, the onedge points only give informa tion about their location on the building. Therefore, an initial footprint must be obtained by gene rating a list of points indicating that a link exists between consecutive points. When gridded data is used and the edge s are extracted from building regions, the polygonal side from one point is obtained by look ing at the closest point on the set of edge points. In a raw data approach (Sampath a nd Shan, 2007), the use of a modified convex hull automatically yields a polygon definition. Figure 5-25. Clustered points from different edge s. Black dots correspond to the initial point set to be segmented. White circles correspond to the set of points in the same cluster. The white circle at the higher position is on a different build ing edge from the edge holding the remaining points in the cluster.

PAGE 159

159 Because of the irregular spacing and the fact that returns are not always obtained from edges, distances between pairs of closest on-ed ge points will not be regular which makes it unpractical to use modified convex hull approach es or the alpha-shape algorithm (Edelsbrunner et al., 1983). A method that overcom es large deviation of the dist ances between onedge points is presented by Melkemi and Djeba li (2000); however, it is a generalization of alpha shapes and filling convex hulls, and, as a hull generator, on-edge points are prone to be excluded from the polygon definition which is undesirable. The method we use to obtain links between poi nts is based on the solu tion of the traveling salesman problem (TSP). The problem, much easier stated than solved, asks for the minimum distance tour through several cities visited individually only once by a salesman. Also, the tour is closed in the sense that the starting city is vis ited again at the end of the tour. Similarly to TSP tour, we intend to obtain an in itial footprint where distances among linked points are minimized and all the points are included in the footprint. So far, the exact solution for the problem is NP (not solved in polynomial time) and has been the obj ect of extensive research and literature in the computer science and mathematics fields (Appleg ate et al., 2006). However, without an exact solution in mind, we use the TSP analogy to obtain in polynomial time a footprint that resembles an approximate (not exact) optimal TSP tour. The footprint obtained has the advantage of capturing the major, minor and a dditional building dire ctions using all the on-edge points. To give an approximate solution of the TSP problem, our method looks first for the minimum spanning tree (MST) of the on-edge point se t. The MST is the set of links that allows us to reach any point in the set from any initia l point and whose distance cost is minimized. The computation of a MST is a standard and very fast task. If the tree is a deep one where each parent has only one child, we can obtai n the footprint just by followi ng each tree node and adding each

PAGE 160

160 one of them as a polygon vertex. However, beca use these trees are likel y to have parent nodes with more than one child, our algorithm is co rrespondingly restricted. This is done by iteratively computing MSTs until a condition is reached and eliminating links of unwanted child-parent relationships between MST computations. The final footprint is an approxim ate TSP tour that is not necessarily the one traveled with minimal cost but is built the fastest while still preserving building directions. Figure 5-26 shows an initial building footprint resul ting from the set of onedge points and the links be tween consecutive points. Initial coarse segmentation. To cluster the data, initial edge segments must ensure that at least one definitive edge can be extracted. The id eal way to define these segments is to detect corner points where a drastic ch ange of direction, when following the boundary points, is found. These kinds of points are indirectly found when shape generalization algorithms are used. An example of generalization algor ithms is the one proposed by Douglas and Peucker (1973).This algorithm has mainly been used as a techni que for simplifying geographic map boundaries at restricted scales. At smaller scal es, precise detail is not needed. Figure 5-26. Initial build ing footprint. Black dots correspond to on-edge points. The points are connected by links defines by a MST-based algorithm.

PAGE 161

161 The technique is based on the idea that a polygo n vertex can be eliminated if the distance to a linear segment, defined by a predecesso r and a successor point in the polygon boundary, is smaller than a certain threshold. The threshold is a function of the linear segment. Because the corner points are found indirectly and a threshold is needed, an algorithm that looks exactly for these dominant (corners) points and does not need any threshold is preferred. The works by Teh and Chin (1989), and Marji and Syi (2003) fall into this category. Because of its fast execution, we partially implemented Marji and Syi’s algo rithm to the initial coarse segmentation. Marji and Syi’s algorithm implements a voting scheme where each one of the points in the initial set votes for two points, one that preced es and the other that succeeds the point, where corners are believed to be. The criterion to c hoose the points is ba sed on a function to be iteratively maximized. This function is calculat ed by using perpendicula r distances of all the points between two points and the line defined by those two points. The most voted points make up the set of dominant points. The points betw een dominant points defi ne the set of initial segments subject to final cl ustering segmentation. Our part ial implementation detects the necessary points to ensure no points on different non-consecutive building e dges are in the same cluster. Figure 5-27 shows an initial coarse e dge segmentation for the building in Figure 5-26. Final clustering-based segmentation. Our segmentation approach is based on sequential clustering. In this kind of proce ssing, the initial number of cluster seeds, their definition, and the order the points are tested agains t the seeds are crucial to obtai ning good quality clusters. In our approach, the order for the points to be tested is already define d by the order given by the MSTbased algorithm. Therefore the best values for th e number and definition of the initial seeds must be found beforehand. This aspect of clustering has been the object of extensive research because nave (brute force) approaches for finding out th ese values are not computationally feasible due

PAGE 162

162 to the exponential number of possibi lities. However, in the buildi ng footprint context we can use additional information to augment our search: the number of initial seeds must be the number of building edges present on the data and the points must be spatially close and clustered in linear shapes that obey accuracy restrictions. Following these premises, we can define candidate seeds and choose from that set a list of the final seeds to be used in the cluster. The ini tial list of candidate seed s is created by defining any group of four consecutive poin ts as a seed. This greatly redu ces the number of possible seeds to ( n -4) where n is the number of points to segment. Once candidate seeds are defined, the clusters obtained from those seeds, by testing whether or not the points to be segmented belong to the seed, are evaluated regard ing its quality. A quality measure thus has to be defined. This measure must incorporate how well the line fits the cluster, how many points are in the final cluster and how distant regular they are separa ted from each one. We correspondingly define a quality measure ( qm ) as: s n e qm (5-12) where e is the standard deviation of the perpendicular distances of th e points in the cluster to the fitting line, n is the number of points in the cluster and s is the maximum original position distance, the number of spots we have to fo llow to go from one point to another in the nonsegmented original set (found between consecutive points in the cluster). The seeds generating clusters with smaller qm are the ones selected as definitive seeds. By dividing e by the number of points, we give weight to the cl usters whose points, on average, contribute the least to the final qm value. By multiplying that factor by s we penalize clusters incl uding non-consecutive points in the original set.

PAGE 163

163 Figure 5-27. Initial coarse edge segmentation. Black dots correspond to the point set to be segmented. Black stars correspond to domina nt point in the shapes. Points between consecutive dominant points are ini tial edges used for final clustering. To sum up, the clustering-based algorithm enco mpasses two main steps. First, initial seeds and the number of these seeds are defined in a li st of final initial seeds. Then, a sequential clustering using those seeds in the list is carried out. The algorithm for selection of initial seeds and the number of seeds works as follows: Initial candidate seeds are defi ned by grouping 4 consecutive point s. If the initial point set has n points, the number of candidate seeds is ( n -4). For any initial seed, all the points in the orig inal point set are tested and added to the cluster if their minimum distance to the cl uster representative line falls within the accuracy-based tolerance. The final cluster gene rated from that seed is evaluated using the quality measure ( qm ) previously defined. The seed yiel ding the best cluster is assigned a spot (the first one) in the list of final seeds. For the remaining candidate seeds and the re maining available point s (those points that have not yet assigned to any cl uster from a final seed) the sa me process is carried out. For each iteration where processing of the remain ing initial candidate seeds and available points is done, a seed is assigne d to the list of final seeds.

PAGE 164

164 This process continues until no more points or se eds are available. The final result is a list of sorted final seeds according to th e quality of their final clusters. Once we have the final list of seeds, the whole set of points is tested against the seeds. For the first final initial seed in the sorted list, the points available to be clustered are added if their minimum distance to the cluster representative fa lls in the accuracy-based tolerance. Once no more points get into the cluster, the next best seed is taken an d the remaining points are tested against the cluster. The process continues until no more points are availa ble for clustering or no additional seed is left. In Figur e 5-28, the clustering of the poi nts on an edge defined by the initial coarse segmentation is shown. In Figure 5-29, all segments found are shown Figure 5-28. Final clustering of points on an edge from the initial coarse segmentation. A) The points on the initial edge are all the points between the points labeled with the black stars and the stars themselves. One of the cl usters in that initial segment is made up by the encircled points. The cluster corre sponds to the buildi ng longer edge on the lower part of the photograph. As seen in the photograph, that edge is surrounded by two shorter edges with the same direction and close across the edge direction. The edge is successfully clustered by the al gorithm. B) Picture of the building.

PAGE 165

165 Figure 5-29. Final segmentation of on-edge point s. Consecutive points represented by the same colored square are in the same segment. So me of the points are not clustered because when added to the clusters they do no t meet the accuracy-based threshold. Use of additional returns wi th zero return-under number. Due to the complexity of some building roofs and the obstruction of close vegetation and structures on them, detection of on-edge returns is not always f easible. Let’s consider part of the roof of Weil Hall, a building beside the football stadium at the University of Florida shown in Figure 5-30. For this building some northern roof edges in close proximity to trees are not covered by on-edge points. Overhanging roof parts also complicate de tection at the south portion of the roof. To obtain line segments from those edges, poi nts on the roof but not on any edge can be used. Even though we cannot be sure those points h it edges, we can assume that the distance to those edges is less than the point distance betw een consecutive points in the scan line. Depending on the Lidar system setting, this distance can be as low as half a meter. To use these points we need to select them from the whol e set of zero return-under points. The problem of finding the points defining the boundary of the roof part is attacked in our approach by the use of on-edge returns. This needs to be solved by another method when on-

PAGE 166

166 edge returns are not available. We use th e following algorithm based on the method used by Zhang et al. (2006): Grid the zero return-under point data at ce rtain resolution based on the inter distance point distance in the scan line. Find the boundary cells by applyi ng a convolution mask to each cell If on-edge points do not exist inside the cells define a new point coordinates. To do this, extract the boundary point coordinates from the coordinates of the points inside the boundary cell Figure 5-30. On-edge and non-on-ed ge points on Weil hall. A) De tected points on-edge on Weil Hall building. Blue points correspond to on-edge returns. Red points correspond to points with zero return-under number. B) Picture of the building.

PAGE 167

167 Figure 5-31. Convolution mask for identifying boundary cells. Grid at left represent a building. One –valued cells are building cells. The re sult of the convolu tion product of the mask with blue cells is 9 and the cell is not in the boundary. When the product is applied to gray (boundary) cells, the result is less than 9. The algorithm differences, when compared to the one described in Zhang et al. (2006), have to do with the mask used and the criteria to define the cell coordinates of the representative point of the boundary cell. In Zhan g’s approach, a cell is a boundary cell if at least one of its 8 neighbor cells is not a building cell. Taking this to convolution mask terms, if a 3x3 matrix, made up of a cell holding a value of 1 and its 8 neighbors, is multiplied term by term by a 3x3 pixel mask containing only 1 values, the cell is not on the boundary if the multiplication value is nine. Otherwise it is a boundary cell (see Figure 5-31) The boundary coordinates are defined by the cell coordinates at its center. Because we need this to select coordinates ta king into account that a ccurate positions must be obtained, a different mask allowing us to ap ply a more specific location-based criteria for coordinate definitions is used. The purpose is to use the points inside the cell and the orientation of the edge to designate the coordinates of a poi nt. This point is the re presentative of all the points inside the cell. If more than one point is available in the cell, we can use the mean

PAGE 168

168 coordinates of all points to get the representative point. If th e point is on a horizontal edge, the representative y-coordinate can be the mi nimum or the maximum value among all the ycoordinates. A different mask that takes into account edge direction is used. Figure 5-32 shows the mask. By defining a mask that outputs unique results for each one of the considered cases, we can define the coordinates of th e representative poi nt for each cell. As seen in the figure, a result of 1111 indica tes a non boundary cell is found. If the result is 1101, the point is on a vertical edge. The representative y-coor dinate is then calculated via average and the representative x-coordinate is the maximum x value. The maximum x-coordinate should be the one closer to the ac tual x-coordinate of the point at that y-coordinate. If the point on that vertical edge were at the west side, the output would be 111 and the representative xcoordinate would instead be th e minimum of x-coordinates. Figures 5-33 to 5-36 show different stages of the gridding / co ordinate search / segmentation processes. Figures 5-33 s hows the grid boundary. Figure 5-34 shows the representative points that are added to the on-edge points in Figure 5-30. Figure 5-35 shows the dominant points of the shape. Finally, Figure 5-36 shows the different segments found on the building part. Footprint fitting. In the fitting, the set of point segments on each building part is used to obtain 2D lines which are used in the bu ilding part footprint deli neation. The adjustment is done in the least square sense and can be restricted according to building dominant direction if they exist. The corners are defined following a set of conditions over the adjusted lines. The same 2–step process (adjustment and corner defi nition) is applied again when building parts make up a complex building roof. Below, a description of these sets is given.

PAGE 169

169 Figure 5-32. Convolution mask for identifying boundary cells and edge direction. The grid at lower left represents a building. One–valued cells are building cells. The grid at upper left is the mask. The result of the convoluti on product of the mask with blue cells is 1111 and the cell is not in the boundary. Wh en the product is applied to cell in horizontal edges the result is 1110. When th e product is applied to cell in vertical edges the result is 1101. Figure 5-33. Gridded boundary for the building part in Figure 5-30

PAGE 170

170 Figure 5-34. Representative cell points on Weil Hall building. Green poi nts correspond to the representative points. Red points correspond to points with zero return-under number. Figure 5-35. Initial coarse edge segmentation. Red dots correspond to the point set to be segmented. Blue dots correspond to dominant point in the shap e. Points between consecutive dominant points are ini tial edges used for final clustering. Figure 5-36. Final segmentation of on-edge point s. Consecutive points represented by the same colored square are in the same segment.

PAGE 171

171 Line definition by least squares-based adjustment. To the set of points in each segment, the best fitting line is obtained by minimizing th eir perpendicular distance s to the line via least squares. The equations for the perpendicular case are more complex than those for the vertical offset case, when the distances are the vertical ones from the points to the line in the 2D Cartesian coordinate system. This adjustment is the same used in Section Appendix A shows the development of the necessary equations to find the coefficients (A, B, and C) of the line equation in the general fo rm. Along with the general simplest case of fitting one line to a point segment the equation for a rest ricted adjustment is developed. As for those restrictions, equations that enforce parallel and perpendicularity conditions between point segments are shown in the appendix. Although on-edge points will give us good estim ation lines, the fact that actual building roof edges may be aligned following some main do minant directions can be used. From the least squares point of view, th e fact that some building edges have the same orientation defined by a dominant direction results in fewer unknowns to be computed and positional accuracy increases for the adjusted line parameters. Our approach clusters segments accord ing to line slope thus indirectly finding dominant directions. Th e fitting method includes the following steps Adjusting of a line to any point segmen ts by a perpendicular distance fitting. Clustering of the lines according to their orientation and collinearity. New adaptive clustering and adjusting with pa rallel restriction over segment clusters. In the first adjustment, the line coefficients in the general form are transformed to the normal form of a line. The normal form of a line is defined by the expression ySin xCos where is the slope line orientation and is the perpendicular distance from the coordinate origin to the line. This transformation is carried out to chec k clustering conditions.

PAGE 172

172 Once line equations are obtained segments ar e first clustered by th eir orientation. The clustering method is the same used in Section 5. 4.4.2 when points were grouped to obtain point segments. The angle tolerance for clustering is defined to be 15 degrees assuming the building may have up to three main directions. In addi tion, collinearity conditions between segments are checked by clustering the segments via the distance. The tolerance is set to twice the expected accuracy of laser returns. Collin earity may occur between consecutive or nonconsecutive edge segments. If most segments are aligned to two main di rections, a new clustering and adjustment with twice the initial tolera nce is carried out. The majority se gment condition is evaluated by first sorting the cluster according to the total segmen ts length (adding the segment lengths in the cluster), and second computing the radius of the se gments length in the firs t two clusters to the total segments length. If the radius is larger than a threshold, it means that two main directions are present, and a larger angle to lerance would improve line accuracy. Corner definition. Once line equations are provided, a corner equation can be found by sequentially processing line equations. The segmen ts were sorted following the building edge when dominant points were obtained in the ed ge segmentation stage (see Section 5.4.42). The algorithm rules to define corners include: If the two consecutive line sl opes are different, a corner is defined by the intersection of the two lines. The angle threshold is set to 30 degrees, assuming that the building may have up to three main directions. If the two consecutive lines are parallel, a corner pair is de fined by using a line perpendicular to the consecutive lines. The corners are f ound by intersecting the perpendicular lines with each one of the cons ecutive lines. The perpendicular line also must intersect the line defined by the last point of the first segment and the first point of the second segment. When the lines in addition have the same value, the pair is reduced to a unique corner.

PAGE 173

173 Figures 5-37 and 5-38 show two stages of the footprint definition for the Weil Hall building part. Figure 5-38 shows the final adjust ed footprint after proc essing the first set of segments in Figure 5-37. Adjustment of complex building roofs. After applying the majority filters (see, complex building roofs can be defined as those made up of simpler parts. In these cases, the whole set of points and segments in each building part are used to globally adjust the complex roof. The same adjustment and corner definition me thod applied to each singl e part is applied to the complex one. Figure 5-39 shows the global adju stment to the building parts and a complex roof shown in Figures 5-22 to 5-24. Figure 5-37. Footprint first adjustment of the point segments shown in Figure 5-36. Figure 5-38. Footprint final adjustment of the point segments shown in Figure 5-36.

PAGE 174

174 Figure 5-39. Footprint for a complex building roof. 5.5 Results 5.5.1 Footprint Delineation Results The method for footprint delineation was applied to the data shown in Figure 5-9. Figures 5-40 and 5-41 show the results. Footprints for complex roofs (Figure 540) are the result of fusing building parts shown in Figure 5-41. The only process where interaction was necessary was the building-tree data training, this stage being th e only non-automated stage, but we could see how little training was needed for the final se gmentation, and even, if the training data is previously stored, the proce ss could be fully automated. The additional delineation of building pa rts offers a more detailed building shape description than the one commonl y shown in footprint maps. Most of the buildings in the area are detected and their footprin ts are delineated. Footprints for a few buildings were not delineated because no on-edge returns were detected on them. Also, a few footprints are partially delineated due to occluding objects or scan capture irregularities. An example of footprints w ith building parts is shown in Figure 5-42. Because for each part of the building on-edge return s were detected, footprints for the two parts of the building are individually adjusted. In a later step the adju stment is globally done using the two parts.

PAGE 175

175 Figure 5-40. Building footprints at UF area.

PAGE 176

176 Figure 5-41. Building footprint pa rts at UF area. Building parts are shown for complex roofs.

PAGE 177

177 Figure 5-42. Footprint of the Bryant Space Scien ce Center at the University of Florida. A) Footprint of the Building. The building is the one at the left made up of four parts. All the parts are flat surfaces leveled at differe nt heights. Usually, details as the highest L-shaped roof on top of the larger rectangular roof are not drawn. B) Photograph of the same building. Although roofs with flat surfaces may seem eas ier to delineate, foot prints from building roofs containing flat and gabled surfaces, as We il Hall in Figure 5-30 and the building roof in Figure 5-43 can be obtained. Even though the building filter is able to detect building points under occluding vegetation, in some cases on-edge points and even roof points can not be detected. In Figure 5-44 an example of that situation is shown. The reason for the detection failure in these cases is related to the fact that the tree branches are very close to the roof and eventually are even touching it. Because of the blind zone (distance) for the sensor to detect multiple returns in those cases, the return detection is not viable. The important fact to see in Figures 5-40 to 5-44 is that building e dges are computed from least square adjustments usi ng direct measurements provide d by returns hitting the actual building edges. No approach has done this before.

PAGE 178

178 Figure 5-43. Footprint of Dahuer Hall at the University of Florida. A) Footprint of the Building. The building roof is made up of four parts. Most parts are gabled surfaces at different heights. B) Photograph of th e same building. Despite the presence of several dormers, the algorithm delineates the edges besides them. Figure 5-44. Footprint of the Computer Scie nce and Engineering bu ilding and the Marston Science Library at the University of Florida. A) The footprint is delineated as a single one because at 2D clustering stage points from both buildings are in the same cluster. The leftmost edge is partially delineated because occlusion of very close trees shown at the photograph at right. B) Photograph of the same buildings.

PAGE 179

179 5.5.2 Accuracy Results To test how accurately corner locations are located, GPS me asurements on building roof corners were carried out. Figure 5-45 shows the corners in the University of Florida campus. The GPS observation data were collected usi ng a combination of Ashtech Z-Surveyor and Z-Xtreme dual frequency, carrier phase re ceivers with Ashtech 700936 Rev D Choke Ring Antennas and 700700C dual frequency marine Antennas. The receivers operated in static mode collecting measurements at a frequency of 1 Hz. Each station was occupied for 20 to 30 minutes. The observation data files were processed us ing the NGS On-line Positioning User Service Rapid Static tool. Coordinates were repor ted in the NAD83 Datum using UTM zone 17. Accuracies for this ground truth are at the subcentimeter level. In Table 5-2, RMSE values for corner coordinates in UF buildings are presented. Also summarized RMSE values are given. Figure 5-45. Location of GPS points for accuracy testing. The points are signaled by the red squares. 24 locations were measured. A fracti on of the football stadium is located at the upper left portion of the picture.

PAGE 180

180 Table 5-2. RMSE for extracted building co rners with respect to GPS measurements Corner GPS X coordinate [Meter] GPS Y coordinate [Meter] Extracted X coordinate [Meter] Extracted Y coordinate [Meter] X [Meter] Y [Meter] Weil Hall 1 369508.1716 3280521.51369507.7723280521.577 0.3999-0.07 2 369520.5735 3280521.39369520.4953280521.425 0.0788-0.0392 3 369520.8935 3280549.05369520.823280549.015 0.0740.0343 4 369508.4456 3280549.18369508.0973280549.167 0.34910.0145 Mean 0.2254-0.0151 Standard error 0.15010.0416 RMSE 0.27080.0442 Bryant Space Center 1 369729.824 3280620.84369729.6643280620.922 0.1599-0.0867 2 369747.372 3280620.46369747.2483280620.651 0.1239-0.1937 3 369748.352 3280668.64369748.2553280668.403 0.09660.239 4 369730.806 3280669.03369730.6713280668.674 0.13460.3529 Mean 0.12870.0779 standard error 0.02270.225 RMSE 0.13070.2381 Stadium 1 369412.816 3280628.74369412.933280628.658 -0.11420.0788 2 369519.457 3280626.96369519.7353280627.137 -0.2776-0.1763 Mean -0.1959-0.0488 Standard error 0.08170.1276 RMSE 0.21220.1366 McCarthy Hall B 1 369800.591 3280419.96369800.7453280420.281 -0.1539-0.319 2 369844.494 3280376.5369844.3413280376.223 0.1530.2814 3 369827.44 3280367.82369827.6273280367.669 -0.18650.1456 Mean -0.06250.036 Standard error 0.15290.2571 RMSE 0.16520.2596 McCarthy Hall A 1 369729.041 3280392.15369728.8253280391.734 0.21580.4155 2 369809.206 3280433.11369809.2553280432.892 -0.04940.2222 3 369795.855 3280450.95369795.9423280450.788 -0.08690.165 Mean 0.02650.2676 Standard error 0.13470.1072 RMSE 0.13730.2882 Stadium Ramp 1 369552.847 3280641.72369552.7773280641.619 0.06970.1012 2 369552.957 3280648.84369552.8983280648.797 0.05910.0413 Mean 0.06440.0712 Standard error 0.00530.03 RMSE 0.06460.0773

PAGE 181

181 Table 5-2. Continued Corner GPS X coordinate [Meter] GPS Y coordinate [Meter] Extracted X coordinate [Meter] Extracted Y coordinate [Meter] X [Meter] Y [Meter] Weimer Hall 1 369593.018 3280492.18369593.323280492.298 -0.302-0.1153 2 369642.257 3280491.59369642.3253280491.726 -0.068-0.1331 3 369642.573 3280525.09369642.7733280525.125 -0.2002-0.0397 4 369593.206 3280504.48369593.4833280504.47 -0.27730.0125 5 369562.053 3280504.85369561.923280504.838 0.13350.007 6 369562.207 3280516.54369562.0793280516.746 0.1277-0.2053 Mean -0.0977-0.079 Standard error 0.17780.0791 RMSE 0.20290.1118 Total Mean 0.01910.0305 Standard error 0.18670.1812 RMSE 0.18770.1837 The accuracy values in the table are close to 1st Class photogrammetry accuracy values for equivalent flight conditions and better than the values for 2nd Class photogrammetry accuracy. 5.5.2 Comparison with Other Footprint Delineation Methods To visualize how our method compares to other methods, two of the most recently proposed algorithms for footprint delineation were implemented to test their performance over the campus data. Although none of those methods assess accuracy, a compar ison of the footprint shapes can be done. In fact, as pointed out in Section 5.2, no footprint delimitation method from Lidar data alone estimates how accurate the build ing features may be. So far, as believed by Abdulla (2008), no photogrammetry accuracy on th e computed building footprints and urban features when Lidar data alone is used has b een reached. By comparing our method with the two methods, the reasons for that phenomenon can be devised. The methods for comparison are a conve x-hull based method over raw Lidar data (Sampath and Shan, 2007) and a boundary detect ion method over gridded Lidar data (Zhang et

PAGE 182

182 al., 2006). Those methods were explained in Section 5.2, and as most of methods, they include in the main algorithm ground, vegetation and building data segmentation. Here we will focus only on the definition of the points used for footprin t delimitation and the extraction of the footprint itself, once the segmentation part is performed. In general terms, both methods detect boundary points that are then segmented into sets defining linear segments. The segmentation some how resembles the de finition of dominant points in our algorithm, so points between thes e “dominant points” are grouped in the same segment. Then, unlike our method, those met hods define from the beginning 2 dominant directions to adjust lines to the point segments. The convex-hull method used a modified convexhull algorithm that detects the points on the building footprint by applying the conventio nal convex-hull algorithm to a small point neighborhood. The neighborhood is a circular one a nd the diameter of it is related to the point density of the Lidar data. In that sense, the al gorithm is very similar to the alpha-shape algorithm for defining boundary polygons. As seen in the Fi gure 5-46, the definition of the neighborhood diameter defines the boundary points. Smaller neighborhoods give more details to th e boundary, but its use includes points that should not be in the boundary. The fact that no actual on-edge points ar e on the boundary affects the accuracy of any curve adjustment to thes e points. Once the boundary points are found, the segments for adjustments are obtained by grouping points between the “dominant points” which are the ones where a big change of slope is found. The slope thre shold is user-defined. Figure 547 shows the definition of the “dominant points” and the adjustments of lines to the boundary points.

PAGE 183

183 Figure 5-46. Definition of boundary points in the convex-hull based algorithm. A) Lidar points of the building shown in Figure 5-28. Re d points and blue points correspond to non on-edge points and on-edge poi nts as defined by our algorithm. B) Detail of the building points showing on-dge and non onedge points. C) Detail of the boundary points as defined by the convex-hull based algorithm. Blue poi nts correspond to the boundary points. Figure 5-47. Dominant points and adjusted lines. A) Definition of “dominant points”. Red points correspond to the boundary points. Blue poi nts correspond to the “dominant points”. B) Adjusted lines in red. The adjustme nt is done by using the points between “dominant points”.

PAGE 184

184 As seen in the figure, the fact that points on the roofs that are not on the edges and make up the boundary may mislead the adjustment. In the specific example of the previous figure, the adjusted lines follow the scan line direction becau se any scan line intersecting the building edge contributes with many points to the boundary set. Also according to the relation within the scan lines and the building main orientations, the scan line direction can become the dominant direction of the building footprint. The grid-based method uses a simpler method to detect boundary points. Because grid data is available, a convolution-based filter detects building pixels as a boundary pixel if at least one of its neighbors is not a building pixel. The tradeoff is that the Lidar data is represented by grid locations which are a result from a simple inclus ion query instead of actual Lidar points. This location averaging process compromises even more accuracy assessment of the footprint features. Figure 5-48 shows boundary locations for the building shown in Figure 5-28. Figure 5-48. Dominant points on gridded data. A) Boundary points from processing gridded data. B) “Dominant points” represented by blue dots. The initial bounda ry is represented by red lines.

PAGE 185

185 One of the parameters to set in this method is the grid size, and it is related to the point density. The smaller the density the larger the pixel size, therefore accuracy should decrease. When the boundary points are obtained, the Douglas -Peucker regularization is applied to obtain “dominant points”. The performance of the regular ization algorithm depends on the selection of a distance tolerance parameter. The simplified boundary polygon that results from connecting the “dominant points” is shown in Figure 5-48. In addition to reporting a misleading accuracy, segment directions are constrained to the directions that can be represente d in a grid structure. For the building in the figure before, many edges would be totally vertic al (with a zero-valued azimuth) even though the actual directions have slightly larger than zero azimuth values. The results from applying the dominant point de tection algorithms to a roof in Weil Hall, one of the buildings shown in Figure 5.44, are shown in Figure 5-49. As can be seen in the figure, the boundary points defined by the grid-b ased and the convex-hull based methods present a noise level that causes the detection of many domin ant points. On the other hand, our method uses only the detected on-edge points to obtain the necessary dominant points to define the footprint shape. For each set of points between consecutive dominant points a lin e fitting is carried out. The final footprint is made up of all the adjusted lin es. The final adjustment for all the methods is based on least squares theory. The main difference between our method and the grid-based and convex-hull based method is the main direc tion assumption. We don’t assume two main directions from the beginning as the other methods do, but we do an additional adjustment when most points are in two main dir ections. This way the main direction is a result from our method instead of an initial assumption. Figure 5-50 sh ows the adjusted footprints for each implemented method.

PAGE 186

186 Figure 5-49. Dominant points for different processing algorithms A) Boundary points for the grid-based method. B) Domi nant points(in blue) extracted from processing the boundary points. C) On-edge (in blue) and non on-edge (in red) points used in our method. D) Dominant points (in blue) extracted for our method. E) Dominant points (in blue) extracted with the convex-hull ba sed method. The boundary points (in red) were obtained when the on-edge and non on-ed ge points shown in C) were processed. The footprint obtained using the grid-based me thod has completely vertical edges due to approximated Lidar point locations extracted from the grid. That orientation does not agree with the actual orientation of those edges in the roof, whereas our method can capture the true orientation (Figure 5-50.C). Also, because of th e noisy boundary, some of the resulting lines have erroneous directions. This behavior is also seen in the footprint obtained when the convex-

PAGE 187

187 hull based method is used. In that method, also the misleading directions in Figure 5-47, where many of the edge directions in th e extracted footprint follow the scan line direction, are also shown. 5.5.3 Enhanced Building Description through Segmentation Intermediate Results Even though footprint delineati on is the ultimate result, building description can be augmented by using some of the computed data when spin-image based building/non-building segmentation is carried out. This kind of se gmentation computes spin images as feature descriptors for each one of the points to be cla ssified. In that process the definition of local reference frames to each point is fitted to its local surface. The XY planes of the local surfaces are expressed as normal form equations with and angles and distance as the equation parameters. Because for building points and describe the local surface on the building roofs, and those a ngles are respectively related to the magnitude and orientation of the local slopes, a local descri ption of the building roof s at the location of the point on those roofs can be obtained. Figure 5-50. Line and footprint adjustment for the Weil hall roof using different methods. A) Using grid-based method. B) Using conve x-hull method. C) Using our method.

PAGE 188

188 The slope magnitude defines the steepness of the local surfaces and it is expressed in degrees or increase percentage. Figure 5-52 sh ows slope magnitudes for the Wei Hall building show in Figure 5-51. Figure 5-51. Weil Hall at UF Campus Figure 5-52. Slope magnitude of Weil hall roof surfaces. Mostly flat su rfaces are located about dark blue points. Steepest surfaces are lo cated about brown color points. Magnitudes are in degrees. Lower degree values correspond to flatter surfaces.

PAGE 189

189 Figure 5-52 shows three main classes correspondi ng to mainly flat, intermediate steep and large steep surfaces. It can be seen that the bu ilding roof has two dormer classes with different slope magnitudes. The ones having larger surfac e area are steeper than the ones with smaller area. In addition to magnitude, the aspect of the slope gives additional roof information. The slope aspect is the direction at which the magnitude increases the most. For the building shown in Figure 5-51, the slope aspect of the roof is shown in Figure 5-53. In the figure, the roof has gabled parts a nd triangular dormers identified by slope aspects showing opposite slope directions. Also, it can be seen that most of the surfaces have aspects following the four main directions (north, east, south and west) within the aspect range. Regarding the extraction of accurate footprints the results show the effectiveness of the return-under attribute information associated to ea ch point cloud. The possi bility of detecting onedge information increases the a ccuracy of final footprints. The clustering approach, previously done to the final adjustments, is able to reject outliers to the edge, improving accuracy values which reach photogrammetric levels. These footprint accurate values are tied to a richer building description provided by the extrac tion of parts inside each f ootprint for complex buildings. Figure 5-53. Slope aspect of Weil hall roof surf aces. Only points at non-f lat surfaces are shown. Aspect values range from 0 to 360 degrees where zero degree is north direction, 90 degree is east direction, 180 degree values is south direc tion and 270 is west direction

PAGE 190

190 It is expected, for future improvements of the methods presented here, that spectral information and three dimensional linear and non-linear curves expand the accuracy and shape representation of the footprints.

PAGE 191

191 CHAPTER 6 CONCLUSIONS AND FUTURE WORK Novel approaches for the classification and th e extraction of object features from urban and suburban Lidar data have been presented. The approaches make it easier to carry out classification and extraction tasks in areas where dense vegetation ha ppen to occur very close to or are overhanging building infras tructure. The introduction of new point features, a spin imagebased feature descriptor and the return-under number constitute key aspects on the effectiveness of the mentioned processing tasks Classification of Lidar point data into building and non-build ing classes is based on spin images, which exploit all three dimensions of th e point cloud data and allow class separability of spatially overlapping class instances. Because th e classification method uses supervised learning of probability distributions, mutual information can be a basis for the algorithm for selecting the best spin image parameter set in terms of classi fication accuracy results. After classification, onedge information, represented by the return-under number attribut e, gives a means for obtaining building footprints with accuracy in the range of traditional photogrammetric levels. Below, detailed conclusions and future work for the building/non-building classi fication, data fusionbased improved classification, and building f ootprint delineation met hods are presented. 6.1 Classification of Building Infrastructure in Airborne Lidar Points Using Spin Images A novel approach for classifying urban and s uburban ALSM data has been defined. First the ALSM point data are classified into ground and non-ground classes. The subsequent classification of the non-ground points is based on a minimal non-parametric learning of the features that separate the tw o classes, building and non-build ing (usually trees and other vegetation). The learning is done over point features instead of pixel-based features, which means that for each one of the points a class deci sion is made. Because the learning is supervised

PAGE 192

192 and based on probability distribut ion functions, empirical decision rules on threshold values are avoided. The approach overcomes some of the difficulties faced when separation of Lidar returns from building and vegetation classes is sought. This discrimination that tu rns out to be a hard task due to the closeness of trees to building a nd house roofs is effectively done due to the use of spin images as a three dimensional feature descriptor of each point neighborhood. If enough points are returned from roofs clos e to or at the same height of nearby trees, or from roofs even partially or totally covered by vegetation, the de scriptor is able to represent the features corresponding to the building roof class. In addition to the use of spin images, an accurate plane fitting for defining a local coordinate frame plays a key ro le in our approach. The modifi ed 3D Hough transform algorithm has proven to be a robust technique to better f it the spin image reference frame to the local surface. Because the method is based on computation of probability distributions, a method based on mutual information (MI) is proposed to choose the best spin image parameter set. MI is computed between the classes and the training data. Parameter sets defined for training that yield the higher MI values are among the more likely to produce the higher ac curacy rates at the classification stage. The flexibility given by the defi nition of spin image paramete r sets and the dependence of classification accuracy rates of the two classes on the bin size allow us to combine classification results at different spin image scales. Overall cl assification accuracy rate s improve when results based on higher resolution spin images, presenting high building accuracy rates, are combined with those based on lower resolution images the ones showing high tree accuracy rates.

PAGE 193

193 Due to the use of a local neighborhood for making classification decisions, real time processing is more feasible. In addition, because of the learning capacity of the method, it could be used to detect any other kind of object diffe ring from buildings such as roads, cars and infrastructure. The classification method has extensively used the k -nearest neighbor rule with an l2 norm as the distance metric to select the neighbors. Th e rule is a good alternat ive when pdf estimation in high dimensional spaces is used in Bayesian classification. However, the bare l2 norm does not account for the structural rela tionships between neighbor pixels and for that reason distances between samples of different classes can be simila r to the ones between objects in the same class. A derived distance metric that accounts for those relationships would improve neighborhood definition and hence cl assification rates. Another improvement in the classification pro cess can be obtained if the distance from the neighbors is used in the posterior probability computation. If the distance is used as a weighting factor in the equation for probability estimation, instead of only counting neighbors belonging to any class, a more accurate estimation, and hence better classification results, could be obtained. 6.2 Improved Classification of Building Infr astructure from Airborne Lidar Data Using Spin Images and Fusion wi th Ground Based Lidar The best classification results of the points on bu ilding roofs were obtained when M-TLS data from these roofs and from the building wa lls were fused with ALSM data (experiments 5 and 4). One possible reason for the result in e xperiment 5 is that the additional geometric information available from points on the roof borders allows a better fitting of the planes using the modified Hough transform. When fusing ALSM data with roof M-TLS data (experiment 4), improvement on the building classification accur acy was also observed. One reason for this

PAGE 194

194 result is that denser sampling of faceted roof st ructures allows a better fitting of the planes over these structures. Experiment 4 performed much better than experiment 5 over tree points. The experiments showed that high resolution M-TLS data can improve classification over complex roof structures, such as gables and dormers, yielding almost 100% of those points classified correctly. These results were obtained by using training sets fr om only the ALSM data. Interestingly, classification accuracy for buildin gs decreased when fused data were used for training, although tree classificati on accuracy increased (experiment 3). This result suggests that training on fused data is not necessary for accurate segmentation of buildings from trees. This is important, because it implies that training could be done on archived ALSM data that is similar to a site of interest. Without the need to train using fused data, fused data could be classified more rapidly without requiring extensive a priori knowledge of each site. This finding could facilitate the timely classification of urban infras tructure soon after natu ral disasters, such as earthquakes, to assess damage. Furthermore, training on fused data can actually degrade performance. Downsampling the ground-based data or reducing the spin image size in such cases may mitigate this effect. In addition to the use of spin images, an accurate plane fitting for defining a local coordinate frame plays a key role in our approach. The modified 3D Hough transform has proven to be a robust technique to be tter fit the spin image referenc e frame to the local surface. For ALSM data and the fusion of ALSM and M-TLS data, classification results depend on the choice of parameters for the spin image and Hough transform. A rigorou s sensitivity analysis of classification performance to these para meters can be carried out in the future.

PAGE 195

195 6.3 Automatic Building Footprint Delineation from On-edge Return Identification in Lidar Data A novel approach for building footprint delin eation from Lidar data is proposed that exploits multi-return and time tag information in addition to the three-dimensional point locations that prior algorithms used. The defin ition of a new return-under number value for each pulse helps us infer the presence of on-edge retu rns (i.e. returns from building edges). The final rectilinear adjustment of the building footprin ts relies on the use of on-edge information and previous segmentation processing. In many cases an adequate segmentation of building points can be accomplished using the graph-based clusteri ng. However, in cases of severe occlusion by overhanging vegetation, the spin image classifi cation could be used beforehand to improve the building/non-building segmentation results. The final building footprint adjustment is pr eceded by the segmentation of edge points into connected edge segments, to which line segmen ts are fit. This is accomplished by iteratively forming linear neighborhoods (seeds) and then ap plying a least squares cost function. The clustering threshold is based on the pre-determin ed horizontal accuracy of the Lidar data set. Because the pre-processing and segmentati on steps of our method only need user interaction for training, the pr oposed method is effectively au tomatic. Moreover, because the training can be done beforehand on Lidar data with similar charac teristics, application of the proposed method can be made totally automatic. Accuracies obtained for the measured buildi ng features using only Lidar data are only slightly worse than those expected for Class 1 photogrammetric products obtained at the same flying altitude; however, they ar e better than the accura cies expected for Class 2 and Class 3 photogrammetric products. To our knowledge, this is the highest level of accuracy yet achieved in the extraction of building footprints solely from Lidar data from comparable altitudes.

PAGE 196

196 The use of majority filters in the horizontal (2D) and agai n in 3D help the method go beyond simple building footprint delineation. As we observed from the results on complex multilevel buildings on campus, the identification of roof surfaces on the same building but at different elevations provides a richer descri ption of such buildings and their complexity. Although non-linear building edges can be deli neated using the proposed method, the result is sub-optimal because the fit is piece -wise linear. For applications where non-linear building edges are expected to be prevalent, nonli near functions such as splines or higher order polynomial segments could be used. In the ca se of very complex non-planimetric building geometries it should be possible to define geometric primitives that are not constrained to lie in a plane for better descriptions of the buildi ng features in 3D. Finally, the limited spectral information in each laser pulse might enable refined segmentation by incorporating material properties rather than solely using geometric information.

PAGE 197

197 APPENDIX PERPENDICULAR DISTANCE LEAST SQUARES LINE FITTING The distance from a line y=a+ bx to a point pi(xi,yi) is 21 | ) ( | b bx a y di i i (1) If those distances are seen as the residuals of n points fitting a line, the function that when minimized for looking the line parameters a and b yields the line parameters is n i n i i i ib bx a y d R11 21 | ) ( | (2) Because the absolute function does not have continues derivatives, we can look for minimizing instead n i i ib bx a y R1 2 2 21 )] ( [ (3) By finding derivatives of (3) with resp ect to the line parameters we have 0 )] ( [ 2 ] 1 [ )] ( [ 2 1 )] ( [1 1 1 2 2 2 n i i i n i i i n i i ibx a y bx a y b bx a y a a R 0 )] ( [1 n i i ibx a y (4) ] [ )] ( [ 2 1 1 1 )] ( [1 2 1 2 2 2 i n i i i n i i ix bx a y b b bx a y b b R n i i ibx a y b b1 2 2)] ( [ 1 2 0 )] ( [ ] [ )] ( [ ) 1 ( ) 1 ( 21 2 1 2 2 2 n i i i i n i i ibx a y b x bx a y b b 0 )] ( [ ] [ )] ( [ ) 1 (1 2 1 2 n i i i i n i i ibx a y b x bx a y b (5) From (4) we have n i i n i i n i i n i n i i n i i ibx y an bx a y bx a y1 1 1 1 1 1)] ( [

PAGE 198

198 x b y n bx y an i i n i i 1 1 (6) Now, because 2 2 2 2 2 2 22 2 2 ) ( ) ( 2 )] ( [ x b abx a bxy ay y bx a bx a y y bx a y (5) becomes 0 ] 2 2 2 [ ] [ ) 1 (1 2 2 2 2 1 2 2 n i i i i i i i n i i i i ix b abx a y bx ay y b bx ax x y b n i i n i i n i i n i i iy b x b b b x ab a b x y b b1 2 1 2 1 3 2 2 2 1 2 2] ) 1 ( [ ] 2 ) 1 ( [ ] 2 ) 1 [( 0 22 1 bn a y abn i i 0 2 ) 1 ( ) 1 (2 1 1 2 1 2 1 2 1 2 bn a y ab y b x b x b a x y bn i i n i i n i i n i i n i i i (7) (6) in (7) 0 ) ( ) ( 2 ) 1 )( ( ) 1 (2 1 1 2 1 2 1 2 1 2 bn x b y y b x b y x b x b y y b x b x y bn i i n i i n i i n i i n i i i n x x b b n y x b y b x b x y b x yn i i n i i n i i n i i n i i n i i n i i i n i i i 1 1 2 1 1 2 1 2 1 2 1 2 1) 1 ( ) 1 ( 0 ) ( 2 ) ( 2 22 1 3 1 1 2 2 1 1 1 2 1 1 n x b n y x b n y b n y x b n y y bn i i n i i n i i n i i n i i n i i n i i n i i n x x b n y x n y x b y b x b x y b x yn i i n i i n i i n i i n i i n i i n i i n i i n i i i n i i i 1 1 3 1 1 1 1 2 1 2 1 2 1 2 1 n y x b n y b n y x b n y y b n x x bn i i n i i n i i n i i n i i n i i n i i n i i n i i 1 1 2 2 1 1 1 2 1 1 1 12 ) ( 2 2 0 ) (2 1 3 n x bn i i

PAGE 199

199 n y n y y n x x y x b x y n y x bn i i n i i n i i n i i n i i n i i n i i n i i i n i i n i i2 1 1 1 1 1 1 2 1 2 1 1 1 2) ( 2 01 1 1 n i i i n i i n i ix y n y x 0 1 ) (1 1 1 2 1 1 1 1 2 1 2 2 n i i i n i i n i i n i i n i i n i i n i i n i ix y n y x n y n x x y x b b 0 1 ) ( ) (1 2 1 1 2 2 1 1 2 2 n i i i n i i n i i n i i n i ix y y x n n x x n y y b b 0 11 2 1 2 2 1 2 2 n i i i n i i n i ix y y x n x n x y n y b b By making n i i i n i i n i ix y y x n x n x y n y k1 2 1 2 2 1 2 We can compute 12 k k b

PAGE 200

200 From the two possible b values, the one mini mizing (3) is the final parameter b. The parameter a is found by replacing b in (6) Perpendicular distance least square s line fitting for 2 parallel lines The parallel line in this case has the same slope (b) but different intercept (c). According to this, in addition to (1) we have 21 | ) ( | b bv c w dj j j (8) And (3) becomes m j j j n i i ib bv c w b bx a y R1 2 2 1 2 2 21 )] ( [ 1 )] ( [ (9) By finding derivatives of (9) with resp ect to the line parameters we have m j j j n i i ib bv c w b bx a y a a R1 2 2 1 2 2 21 )] ( [ 1 )] ( [ 0 0 )] ( [1 n i i ibx a y (10) m j j j n i i ib bv c w b bx a y c c R1 2 2 1 2 2 21 )] ( [ 1 )] ( [ 0 0 )] ( [1 m j j jbv c w (11) m j j j n i i ib bv c w b bx a y b b R1 2 2 1 2 2 21 )] ( [ 1 )] ( [ 0 1 )] ( [ 1 )] ( [1 2 2 1 2 2 m j j j n i i ib bv c w b b bx a y b n i i i i n i i ibx a y b x bx a y b b1 2 1 2 2 2)] ( [ ] [ )] ( [ ) 1 ( ) 1 ( 2 0 )] ( [ ] [ )] ( [ ) 1 ( ) 1 ( 21 2 1 2 2 2 m j j j j m j j jbv c w b v bv c w b b n i i i i n i i ibx a y b x bx a y b1 2 1 2)] ( [ ] [ )] ( [ ) 1 (

PAGE 201

201 0 )] ( [ ] [ )] ( [ ) 1 (1 2 1 2 m j j j j m j j jbv c w b v bv c w b (12) From (10) and (11) we have x b y a (13) v b w c (14) Now, because 2 2 2 2 22 2 2 )] ( [ x b abx a bxy ay y bx a y and 2 2 2 2 22 2 2 )] ( [ v b bcv c bvw cw w bv c w (12) becomes bn a y ab y b x b x b a x y bn i i n i i n i i n i i n i i i 2 1 1 2 1 2 1 2 1 22 ) 1 ( ) 1 ( 0 2 ) 1 ( ) 1 (2 1 1 2 1 2 1 2 1 2 bm c w bc w b v b v b c v w bm j i m j j m j j m j j m j j j (15) (13) and (14) in (15) bn x b y y b x b y x b x b y y b x b x y bn i i n i i n i i n i i n i i i 2 1 1 2 1 2 1 2 1 2) ( ) ( 2 ) 1 )( ( ) 1 ( m j j m j j m j j m j j m j j jw b v b w v b v b w w b v b v w b1 1 2 1 2 1 2 1 2) ( 2 ) 1 )( ( ) 1 ( 0 ) (2 bm v b w n x x b b n y x b y b x b x y b x yn i i n i i n i i n i i n i i n i i n i i i n i i i 1 1 2 1 1 2 1 2 1 2 1 2 1) 1 ( ) 1 ( n x b n y x b n y b n y x b n y y bn i i n i i n i i n i i n i i n i i n i i n i i 2 1 3 1 1 2 2 1 1 1 2 1 1) ( 2 ) ( 2 2 m v v b b m v w b w b v b v w b v wm j j m j j m j j m j j m j j m j j m j j j m j j j 1 1 2 1 1 2 1 2 1 2 1 2 1) 1 ( ) 1 ( 0 ) ( 2 ) ( 2 22 1 3 1 1 2 2 1 1 1 2 1 1 m v b m w v b m w b m w v b m w w bm j j m j j m j j m j j m j j m j j m j j m j j

PAGE 202

202 n x x b n y x n y x b y b x b x y b x yn i i n i i n i i n i i n i i n i i n i i n i i n i i i n i i i 1 1 3 1 1 1 1 2 1 2 1 2 1 2 1 n x b n y x b n y b n y x b n y y b n x x bn i i n i i n i i n i i n i i n i i n i i n i i n i i n i i 2 1 3 1 1 2 2 1 1 1 2 1 1 1 1) ( 2 ) ( 2 2 m v v b m w v m w v b w b v b v w b v wm j j m j j m j j m j j m j j m j j m j j m j j m j j j m j j j 1 1 3 1 1 1 1 2 1 2 1 2 1 2 1 m w v b m w b m w v j b m w w b m v v bm j j m j j m j j m j j m j j m j j m j j m j j m j j 1 1 2 2 1 1 1 2 1 1 1 12 ) ( 2 2 0 ) (2 1 3 m v bm j j n y n y y n x x y x b x y n y x bn i i n i i n i i n i i n i i n i i n i i n i i i n i i n i i 2 1 1 1 1 1 1 2 1 2 1 1 1 2) ( 2 n i i i n i i n i ix y n y x1 1 1 m w m w w m v v w v b v w m w v bm j j m j j m j j m j j m j j m j j m j j m j j j m j j m j j 2 1 1 1 1 1 1 2 1 2 1 1 1 2) ( 2 01 1 1 m j j j m j j m j jv w m w v m j j j m j j m j j n i i i n i i n i i m j j m j j m j j m j j m j j n i i n i i n i i n i i n i iv w m w v x y n y x m w m v v w v n y n x x y x b1 1 1 1 1 1 2 1 1 1 1 2 1 2 2 1 1 1 1 2 1 2) ( ) ( 0 12 b

PAGE 203

203 m j j j n i i i m j j m j j m j j m j j n i i n i i n i i n i iv w w v m x y y x n m v v m w w n x x n y y b1 1 2 1 1 2 2 1 1 2 2 1 1 2 2 1 1 2) ( ) ( ) ( ) ( 0 12 b 0 11 1 2 1 2 2 1 2 2 1 2 2 1 2 2 m j j j n i i i m j j m j j n i i n i iv w w v m x y y x n v m v w m w x n x y n y b b By making m j j j n i i i m j j m j j n i i n i iv w w v m x y y x n v m v w m w x n x y n y k1 1 2 1 2 2 1 2 2 1 2 2 1 2 We can compute 12 k k b From the two possible b values, the one mini mizing (9) is the final parameter b. The parameter a and c are found by re placing b in (13) and (14) Perpendicular distance least squares lin e fitting for 2 perpendicular lines Compared with the parallel lines case, the perp endicular line in this case has a slope value of -1/b to (1) we have 2 11 | ) ( | b v b c w dj j j (16) And (3) becomes

PAGE 204

204 m j j j n i i ib v b c w b bx a y R1 2 2 1 1 2 2 21 )] ( [ 1 )] ( [ (17) By finding derivatives of (17) with resp ect to the line parameters we have m j j j n i i ib v b c w b bx a y a a R1 2 2 1 1 2 2 21 )] ( [ 1 )] ( [ 0 0 )] ( [1 n i i ibx a y (18) m j j j n i i ib v b c w b bx a y c c R1 2 2 1 1 2 2 21 )] ( [ 1 )] ( [ 0 0 )] ( [1 1 m j j jv b c w (19) m j j j n i i ib v b c w b bx a y b b R1 2 2 1 1 2 2 21 )] ( [ 1 )] ( [ 0 1 )] ( [ 1 )] ( [1 2 2 1 1 2 2 m j j j n i i ib v b c w b b bx a y b n i i i i n i i ibx a y b b x bx a y b b R1 2 2 1 2 2)] ( [ 1 2 ] [ )] ( [ 2 1 1 ] [ ] 1 )][ ( [ 2 1 12 1 1 2 j m j j jv b v b c w b 0 )] ( [ ) 2 ( ) 1 )( 1 (1 2 1 3 2 2 m j j jv b c w b b Because 1 2 2 2 1 2 2 1 2 2 2 1 2) 1 ( 2 2 ) 1 ( 2 ] ) 1 [( 2 1 b b b b b b b b b and ) 2 ( ) 1 )( 1 ( ) 2 ( ] ) 1 )[( 1 ( ) 2 ( ) 1 )( 1 (3 4 2 2 3 2 2 2 3 2 2 b b b b b b b b b b2 2) 1 ( 2 n i i i i n i i ibx a y b x bx a y b b1 2 1 2 2 2)] ( [ ] [ )] ( [ ) 1 ( ) 1 ( 2 0 )] ( [ ] [ )] ( [ ) 1 ( ) 1 ( 21 2 1 1 1 2 2 2 m j j j j m j j jv b c w b v v b c w b b n i i i i n i i ibx a y b x bx a y b1 2 1 2)] ( [ ] [ )] ( [ ) 1 (

PAGE 205

205 m j j j j m j j jv b c w b v v b c w b1 2 1 1 1 2)] ( [ ] [ )] ( [ ) 1 ( (20) From (18) and (19) we have x b y a (21) v b w c1 (22) Now, because 2 2 2 2 22 2 2 )] ( [x b abx a bxy ay y bx a y and 2 1 2 2 1 1 2 2 12 2 ) ( ) ( 2 )] ( [c vw b cw w v b c v b c w w v b c w 2 2 12v b cv b (20) becomes n i i i i i i i n i i i i ix b abx a y bx ay y b bx ax x y b1 2 2 2 2 1 2 2] 2 2 2 [ ] [ ) 1 ( 0 ] 2 2 2 [ ] [ ) 1 (1 2 2 1 2 1 2 1 2 1 2 m j j j j j j j m j j j j jv b v cb c w v b cw w b v b cv v w b n i i n i i n i i n i i iy b x b b b x ab a b x y b b1 2 1 2 1 3 2 2 2 1 2 2] ) 1 ( [ ] 2 ) 1 ( [ ] 2 ) 1 [( bn a y abn i i2 12 m j j m j j m j j m j j jw b v b b b v c c b v w b1 2 1 2 1 1 1 2 2 1 2] ) 1 [( ] 2 ) 1 ( [ ] 2 ) 1 [( 0 22 1 m bc w bcm j j bn a y ab y b x b x b a x y bn i i n i i n i i n i i n i i i 2 1 1 2 1 2 1 2 1 22 ) 1 ( ) 1 ( 0 2 ) 1 ( ) 1 (2 1 1 2 1 2 1 2 1 2 bm c w bc w b v b v b c v w bm j i m j j m j j m j j m j j j (23) (21) and (22) in (23) bn x b y y b x b y x b x b y y b x b x y bn i i n i i n i i n i i n i i i 2 1 1 2 1 2 1 2 1 2) ( ) ( 2 ) 1 )( ( ) 1 ( m j j m j j m j j m j j m j j jw b v b w v b v b w w b v b v w b1 1 1 2 1 1 2 1 2 1 2) ( 2 ) 1 )( ( ) 1 (

PAGE 206

206 0 ) (2 1 bm v b w n x x b b n y x b y b x b x y b x yn i i n i i n i i n i i n i i n i i n i i i n i i i 1 1 2 1 1 2 1 2 1 2 1 2 1) 1 ( ) 1 ( n x b n y x b n y b n y x b n y y bn i i n i i n i i n i i n i i n i i n i i n i i 2 1 3 1 1 2 2 1 1 1 2 1 1) ( 2 ) ( 2 2 m v v b b m v w b w b v b v w b v wm j j m j j m j j m j j m j j m j j m j j j m j j j 1 1 2 1 1 1 2 1 2 1 2 1 2 1) 1 ( ) 1 ( 0 ) ( 2 ) ( 2 22 1 1 1 1 2 1 1 1 1 1 m v b m w v m w b m w v m w w bm j j m j j m j j m j j m j j m j j m j j m j j n x x b n y x n y x b y b x b x y b x yn i i n i i n i i n i i n i i n i i n i i n i i n i i i n i i i 1 1 3 1 1 1 1 2 1 2 1 2 1 2 1 n x b n y x b n y b n y x b n y y b n x x bn i i n i i n i i n i i n i i n i i n i i n i i n i i n i i 2 1 3 1 1 2 2 1 1 1 2 1 1 1 1) ( 2 ) ( 2 2 m v v b m w v m w v b w b v b v w b v wm j j m j j m j j m j j m j j m j j m j j m j j m j j j m j j j 1 1 1 1 1 1 1 2 1 2 1 2 1 2 1 m w v m w b m w v m w w b m v v bm j j m j j m j j m j j m j j m j j m j j m j j m j j 1 1 2 1 1 1 1 1 1 12 ) ( 2 2 0 ) (2 1 1 m v bm j j n y n y y n x x y x b x y n y x bn i i n i i n i i n i i n i i n i i n i i n i i i n i i n i i 2 1 1 1 1 1 1 2 1 2 1 1 1 2) ( 2 n i i i n i i n i ix y n y x1 1 1

PAGE 207

207 m w m w w m v v w v b v w m w v bm j j m j j m j j m j j m j j m j j m j j m j j j m j j m j j 2 1 1 1 1 1 1 2 1 2 1 1 1 2) ( 2 01 1 1 m j j j m j j m j jv w m w v m j j j m j j m j j n i i i n i i n i i m j j m j j m j j m j j m j j n i i n i i n i i n i i n i iv w m w v x y n y x m w m v v w v n y n x x y x b1 1 1 1 1 1 2 1 1 1 1 2 1 2 2 1 1 1 1 2 1 2) ( ) ( 0 12 b m j j j n i i i m j j m j j m j j m j j n i i n i i n i i n i iv w w v m x y y x n m v v m w w n x x n y y b1 1 2 1 1 2 2 1 1 2 2 1 1 2 2 1 1 2) ( ) ( ) ( ) ( 0 12 b 0 11 1 2 1 2 2 1 2 2 1 2 2 1 2 2 m j j j n i i i m j j m j j n i i n i iv w w v m x y y x n v m v w m w x n x y n y b b By making m j j j n i i i m j j m j j n i i n i iv w w v m x y y x n v m v w m w x n x y n y k1 1 2 1 2 2 1 2 2 1 2 2 1 2

PAGE 208

208 We can compute 12 k k b From the two possible b values, the one mini mizing (17) is the final parameter b. The parameter a and c are found by re placing b in (21) and (22)

PAGE 209

209 LIST OF REFERENCES Abdullah, Q., 2008. MappingMatters: how effective are lidar datasets in mapping land features such as roads and buildings?. Photogrammet ric Engineering and Remote Sensing 74 (4), 395. Alharty, A., Bethel, J., 2002. Heuritic clustering and 3D feat ure extraction from Lidar data. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 34 (Part 3A), 29-34. Applegate, D., Bixby, R., Chvatal, V., Cook, W., 2006. The Traveling Salesman Problem: A Computational Study. Princeton University Press, Princeton. Axelsson, P., 1999. Processing of laser scanne d data – algorithms a nd applications. ISPRS Journal of Photogrammetry and Re mote Sensing 54 (2-3), 138-147. Baltsavias, E. P., 1999. Airborne laser scanning: basic relations and formulas. ISPRS Journal of Photogrammetry and Remote Sensing 54 (2-3), 199-214. Battiti, R., 1994. Using mutual information for selecting features in supervised neural net learning. IEEE Transactions on Ne ural Networks 5 (4), 537-550. Belton, D., Lichti, D., 2006. Cl assification and segmentation of terrestrial laser scanner point clouds using local variance information. In ternational Archives of Photogrammetry, Remote Sensing and Spatial Inform ation Sciences 36 (Part 5), 44-46. Bennamoun, B., Mamic, G., 2002. Object Reco gnition – Fundamentals and Case Studies. Springer, London. Bishop, C. M., 2006. Pattern Recognition a nd Machine Learning. Springer, New York. Brunn, A., Weidner, U., 1998. Hierarchical baye sian nets for building extraction using dense digital surface models. ISPRS Journal of P hotogrammetry and Remote Sensing 53 (5), 296-307. Caceres, J. J., Slatton, K.C., Shrestha, R., 2007. Classification of build ing infrastructure in airborne lidar point clouds using spin images Unpublished. Caceres, J. J., Slatton, K. C, 2007. Improved Cl assification of Building Infrastructure from Airborne Lidar Data Using Spin Images and Fusion with Ground-Based Lidar. In 4th IEEE GRSS/ISPRS Joint Workshop on Remote Sens ing and Data Fusion over Urban Areas, Paris, France, 2007. Cord, M., Declercq, D., 2001. Three-Dimensi onal Building Detection and Modeling Using a Statistical Approach. IEEE Transactions on Image Processing 10 (5), 715-723. Cover, T., Thomas, J., 1991. Elements of information Theory. Wiley, New York.

PAGE 210

210 Douglas, D., Peucker, T., 1973. Algorithms for the reduction of the number of points required to represent a digitized line or its caricatur e. Canadian Cartographer 10 (2), 112-122. Duda, R. O., Hart, P. E., Stork, D. G., 2001. Pattern Classification. Wiley, New York. Edelsbrunner, H., Kirkpatrick, D., Seidel, R., 1983. On the shape of a set of points in the plane. IEEE Transactions on Information Theory 29 (4), 551-559. Elberink, S., Maas, H., 2000. The Use of anis otropic height texture measures for the segmentation of airborne laser scanner data. International Archives of Photogrammetry, Remote Sensing and Spatial Informa tion Sciences 33 (Part B3), 678-684. Falkner, E., 1995. Aerial Ma pping: Methods and Applicat ions. Lewis, Boca Raton. Filin, S., 2002. Surface clustering from airborne la ser scanning data. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 34 (Part 3A), 119-124. Filin, S., Pfeifer, N., 2005. Neighborhood Systems for Airborne Laser Data. Photogrammetry Engineering and Remore Sensing 71 (6), 743-755. Filin, S., Pfeifer, N., 2006. Segmentation of airb orne laser scanning data using a slope adaptive neighborhood. ISPRS Journal of Photogrammetr y and Remote Sensing. 60 (2), 71-80. Fraser, C. S., Baltsavias, E., Gruen, A., 2002. Processing of Ikonos imagery for submetre 3D positioning and building extraction. ISPRS Journal of Photogrammetry and Remote Sensing 56 (3), 177-194. Gabet, L., Giraudon, G., Renouard, L., 1997. Automa tic generation of high resolution urban zone digital elevation Models. ISPR S Journal of Photogrammetry and Remote Sensing 52 (1), 33-47. Gleich., D., 2007, August. MatlabBGL, A matlab graph library. Available: http://www.stanford.edu/~dgl eich/programs/matlab_bgl/. Gruen, A., 1998. TOBAGO – a semi-automated a pproach for the generation of 3-D building models. ISPRS Journal of Photogrammetr y and Remote Sensing 53 (2), 108-118. Gruen, A., Bar, S., Buhrer, T., 2000. DTMs de rived automatically from DIPs – where do we stand. Geoinformatics 3 (5), 36-39. Gruen, A., Wang, X., 1998. CC-Modeler: a t opology generator for 3-D city models. ISPRS Journal of Photogrammetry and Remote Sensing 53 (5), 286-295. Haala, N., Brenner, C., 1999. Extraction of buildings and trees in urban environments. ISPRS Journal of Photogrammetry and Re mote Sensing 54 (2-3), 130-137.

PAGE 211

211 Haithcoat, T. L., Song, W., Hipple, J. D ., 2001. Building Footprint Extraction and 3-D Reconstruction from Lidar Data. In: IEEE/I SPRS Joint Workshop on remote Sensing and Data Fusion over Urban Areas, Ro me, Italy, 2001, pp. 74-78. Hofman, A., 2004. Analysis of TINstructure parameter spaces in airborne laser scanner data for 3-D building model generation. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 35 (Part B3), 302-307. Hough, P. V. C., 1962. Method and means for rec ognising complex patterns. U. S. patent no. 3069654. Huertas, A, Nevatia, R., 1988. De tecting buildings in aerial imag es. Computer Vi sion, Graphics, and Image Processing 41, 131-152. Jedynak, B., Khudanpur, S., 2005. Maximum like lihood set for estimati ng a probability mass function. Neural Com putation 17 (7), 1508-1530. Johnson, A. E., 1997. Spin-Images: a repres entation for 3-D surface Matching. Ph.D. dissertation, Carnegie Mellon Univ., Pittsburgh, PA. Kampa, K., Slatton, K. C., 2004. An adaptive f ilter for segmenting vegetation in ALSM data. In: Proc. IEEE 2004 International Geoscience a nd Remote Sensing Symposium (IGARSS), Anchorage, 2004, pp. 3837-3840. Kraskov, A., Stgbauer, H., Grassberger, P., 2004. Estimating mutual information. Physical Review E 69 (16), 066138. Kwak, N., Choi, C., 2002. Input feature selection for classification problem s. IEEE Transactions on Neural Networks 13 (1), 143-159. Leavers, V. F., 1992. Shape Detection in Computer Vision Using the Hough Transform. Springer-Verlag, London. Lemmens, M., 1996. Structure-Based Edge Dete ction. Delft University Press, Delft. Luzum, B., Slatton, K., 2005. Anal ysis of spatial and temporal st ability of airborne laser swath mapping data in feature space. IEEE Transa ctions on Geoscience and Remote Sensing 43 (6), 1403-1420. Maas, H., Vosselman, G., 1999. Two Algorithms for extracting Building Models from Raw Laser Altimetry Data. ISPRS Journal of Phot ogrammetry and Remote Sensing (54) (2-3), 153-163. Marji, M., Siy, P., 2003. A new algorithm for dominant points detec tion and polygonization of digital curves. Pattern Recognition (36), 2239-2251. Melkemi, M., Djebali, M., 2000. Computing the sh ape of a planar point set. Pattern Recognition 33 (2000), 1423-1436.

PAGE 212

212 Merrick and Company, 2007, Janua ry. GeoSpatial Solutions: Merrick Advanced Remote Sensing (MARS) Software. Available: http://www.merrick.com/servicelines/gis/mars.aspx. McIntosh, K., Krupnik, A., 2002. Integration of laser-derived DSMs and matched image edges for generating an accurate surface model. I SPRS Journal of Photogrammetry and Remote Sensing 56 (3), 167-176. Morgan, M., Habib A., 2002. Interpolation of Lida r Data and Automatic Bu ilding Extraction. In: Proceedings of the American Society of Photogrammetry and Remote Sensing (ASPRS), Washington D.C., 2002. Morgan, M., Tempfli, K., 2000. Automatic building extraction from airborne laser scanning data. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 33 (Part B3), 616-623. Oriot, H., Michel, A., 2004. Building extraction from stereoscopic aerial images. Applied Optics 43, 218-226. Overby, J., Bodum, L., Kjems, E., Ilsoe, P., 2004. Automatic 3D building reconstruction from airborne laser scanning and cad astral data using Hough transf orm. International Archives of Photogrammetry, Remote Sensing and Spa tial Information Sciences 35 (Part B3), 296301. Pu, S., Vosselman, G., 2006. Automatic extracti on of building features from terrestrial laser scanning. International Arch ives of Photogrammetry, Remote Sensing and Spatial Information Sciences 36 (Part 5). Rabbani, T., Dijkman, S., Van den Heuvel, F., Vosselman, G., 2007. An integrated approach for modeling and global regi stration of point clouds. ISPRS Journal of Photogrammetry and Remote Sensing 61 (6), 355-370. Rabbani, T., Van den Heuvel, F. A., Vosselman, G., 2006. Segmentation of point clouds using smoothness constraint. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences 36 (Part 5), 248-253. Rottensteiner, F., Trinder, J., Clode, S., K ubik, K., 2007. Building detection by fusion of airborne laser scanner data and multi-spectral images: Performance evaluation and sensitivity analysis. ISPRS Journal of P hotogrammetry and Remote Sensing 62 (2), 135149. Saleh, B. E. A., Teich, M. C., 1991. Fundamentals of Photonics. Wiley-Inteerscience, New York. Samadzadegan, F., Azizi, A., Hahn, M., Lucas C., 2005. Automatic 3D object recognition and construction based on neuron-fuzzy modeling. ISPRS Journal of Photogrammetry and Remote Sensing 59 (5), 255-277. Sampath, A., Shan, J,. 2007. Building boundary tr acing and regularization from airborne lidar point clouds. Photogrammetric Engineeri ng and Remote Sensing 73 (7), 805-812.

PAGE 213

213 Schwalbe, E., Mass, H., Seidel, F., 2005. 3D building model generation from airborne laser scanner data using 2D GIS data and ort hogonal point cloud project ions. International Archives of Photogrammetry, Remote Sensi ng and spatial information Sciences 36 (Part 3/W19), 209-213. Shan, J., Sampath, A., 2005. Urban DEM generati on from raw lidar data: a labeling algorithm and its performance. Photogrammetric Engin eering and Remote Sensing 71 (2), 217-226. Shiode, N., 2000. 3D urban models: Recent devel opments in the digital modelling of urban environments in three dimensions. GeoJournal 52 (3), 263-269. Shrestha, R., Carter, W., Lee, M., Finer, P., Sartori, M., 1999. Airbor ne laser swath mapping: accuracy assessment for surveying and ma pping application. Surveying and Land Information Systems Journal of the Ameri can Congress on Surveying and Mapping 59 (2), 83-94. Shufelt, J., 1999. Performance evaluation and an alysis of monocular building extraction from aerial imagery. IEEE Transactions on Pattern Analysis and Machin e Intelligence 21, 311– 326. Simonetto, E., Oriot, H., Garello, R., 2005. Rect angular extraction from stereoscopic airborne radar images. IEEE Transactions on Geosci ence and Remote Sensing 43 (10), 2386-2395. Sithole, G., Vosselman, G., 2004. Experimental comparison of filte r algorithms for bare-Earth extraction from airborne lase r scanning point clouds. ISPR S Journal of Photogrammetry and Remote Sensing 59 (1-2), 85-101. Sithole, G., Vosselman, G., 2006. Br idge detection in airborne la ser scanner data. ISPRS Journal of Photogrammetry and Remote Sensing. 61 (1), 33-43. Slatton, C., Shrestha, R., Cart er, W., Dietrich, W., 2007. Resear ch quality’ Airborne laser swath mapping: the defining factor s. Unpublished results. Smith, S., 2003. Urban remote sensing: the use of Lidar in the creation of physical urban models. In: Longley, P., Batty, M., (Eds.), Advanced Spatial Analysis-The CASA book of GIS. ESRI Press, Redlands, CA, 171-190. Sohn, G., Dowman, I., 2007. Data fusion of high-re solution satellite imager y and Lidar data for automatic building extraction. ISPRS Journa l of Photogrammetry and Remote Sensing 62 (1), 43-63. Teh, C., Chin, R., 1989. On the detection of dominant points on digital curves. IEEE Transactions on Pattern Analysis an d Machine Intelligence (8), 859-872. Tesmer, M., Estvez, P., 2004. AMIFS: Adaptive f eature selection by using mutual information. In: Proceedings of the Inte rnational Joint Conference on Neural Networks and IEEE International Conference on Fuzzy Systems 1, Budapest, Hungary, 24-29 July 2004, 303308.

PAGE 214

214 Theodoridis, S., Koutroumbas, K., 2003. Pattern recognition. Elsevier Academic Press, Amsterdam. Tupin, F., Roux, M., 2003. Detection of building outlines based on the fusion of SAR and optical features. ISPRS Journal of Photogrammetry and Remote Sensing 58 (1-2), 71-82. U.S. Army Corps of Engineers Staff, 1996. P hotogrammetric Mapping. American Society of Civil Engineers. Wehr, A., Lohr, U., 1999. Airbor ne laser scanning – an intr oduction and overview. ISPRS Journal of Photogrammetry and Re mote Sensing 54 (2-3), 68-82. Weidner, U., Forstner, W., 1995. Towards auto matic building extraction from high-resolution digital elevation models. ISPRS Journal of Photogrammetry and Remote Sensing 50 (4), 38-49. Wolf, P., Dewitt, B., 2000. Elements of Photogr ammetry With Applications in GIS, third ed. McGraw-Hill, Boston. Xu, F., Jin, Y., 2007. Automatic reconstructi on of building objects from multiaspect meterresolution SAR images. IEEE Transactions on Geoscience and Remote Sensing 45 (7), 2336-2354. You, S., Hu, J., Neumman, U., Fox, P., 2003. Urba n site modeling from lidar. In: Proceedings of Second International Workshop on Computer Graphics and Geometric Modeling, Montreal, Canada, 579 – 588. Zebedin, L., Klaus, A., Gruber-Geymayer, B ., Karner, K., 2006. Towards 3D map generation from digital aerial images. ISPRS Journal of Photogrammetry and Remote Sensing 60 (6), 413-427. Zhang, K., Yan, J., Chen, S., 2006. Automatic cons truction of building foot prints from airborne lidar data. IEEE Transactions on Geoscien ce and Remote Sensing 44 (9), 2523-2533.

PAGE 215

215 BIOGRAPHICAL SKETCH Jhon J. Caceres was born in Bucaramanga, Colombia. He received a B.S degree in computer engineering in 1995 and a M.S. degree in computer science in 1999 from Universidad Industrial de Santander. From 1999 to 2003 he was a professor and a resear ch assistant at the Universidad Industrial de Santander. He comple ted his work for a Ph.D degree in 2008 at the University of Florida.