<%BANNER%>

Application of LIDAR Intensity Measurements for Beach Zone Segmentation

Permanent Link: http://ufdc.ufl.edu/UFE0021837/00001

Material Information

Title: Application of LIDAR Intensity Measurements for Beach Zone Segmentation
Physical Description: 1 online resource (59 p.)
Language: english
Creator: Vemula, Raghavendra K
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2007

Subjects

Subjects / Keywords: beach, classification, intensity, lidar
Civil and Coastal Engineering -- Dissertations, Academic -- UF
Genre: Civil Engineering thesis, M.S.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: A coast represents one of the most important boundary zones on our planet, marking the dynamic, three way interface between land, sea and the overlying atmosphere. According to the Coastal Zone Management Act (107th Congress), over 60% of all Americans live within 50 miles of the Atlantic and Pacific Oceans, the Gulf of Mexico and the five Great Lakes. The population density of these areas is four times the national average, and coastal population is expected to grow by 15% during the next two decades. With this rise in population, there has been a tremendous increase in the competing uses of coastal resources. Moreover, sea level has risen approximately 130 m in the past 17,500 years. More abundant greenhouse gases in the atmosphere may be increasing the earth's average temperature and may, yet again, accelerate the global sea level rise, eventually inundating much of today's coastal regions. Current projections for U.S. population growth in coastal regions suggest accelerating losses of wetlands that are being destroyed by erosion, dredge and fill, eutrophication, impoundments and excessive turbidity and sedimentation. Continued deterioration of the physical condition in these regions may lead to the collapse of coastal ecosystems. Documentation and incessant monitoring is needed for the conservation and effective management of the coastal regions. With the advancement of technology, the techniques used for handling, analyzing and modeling information about the locations of phenomena and features on the earth's surface have evolved leading to faster and efficient methodologies. In the past, coastal engineers and scientists have developed sophisticated methods of generalization and abstraction to deal with problems related to mapping coastal regions. A relatively new optical remote sensing technology known as LIDAR (Light Detection and Ranging) simplifies the formerly expensive and tedious process of surface elevation surveying significantly. Other terms for LIDAR include ALSM (Airborne Laser Swath Mapping), Airborne Laser Mapping and Laser altimetry. The research presented in this thesis is an attempt at using LIDAR measurements to model the coastal terrain and the corresponding intensity measures to classify a beach zone into three classes namely dry sand, wet sand and water. The affects of atmosphere and the surface material of the target on the laser beam are discussed. Two simple classifiers, one supervised and the other unsupervised are used to segment the data into different information classes and the performance accuracies are presented.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Raghavendra K Vemula.
Thesis: Thesis (M.S.)--University of Florida, 2007.
Local: Adviser: Shrestha, Ramesh L.
Local: Co-adviser: Slatton, Kenneth C.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2009-12-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2007
System ID: UFE0021837:00001

Permanent Link: http://ufdc.ufl.edu/UFE0021837/00001

Material Information

Title: Application of LIDAR Intensity Measurements for Beach Zone Segmentation
Physical Description: 1 online resource (59 p.)
Language: english
Creator: Vemula, Raghavendra K
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2007

Subjects

Subjects / Keywords: beach, classification, intensity, lidar
Civil and Coastal Engineering -- Dissertations, Academic -- UF
Genre: Civil Engineering thesis, M.S.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: A coast represents one of the most important boundary zones on our planet, marking the dynamic, three way interface between land, sea and the overlying atmosphere. According to the Coastal Zone Management Act (107th Congress), over 60% of all Americans live within 50 miles of the Atlantic and Pacific Oceans, the Gulf of Mexico and the five Great Lakes. The population density of these areas is four times the national average, and coastal population is expected to grow by 15% during the next two decades. With this rise in population, there has been a tremendous increase in the competing uses of coastal resources. Moreover, sea level has risen approximately 130 m in the past 17,500 years. More abundant greenhouse gases in the atmosphere may be increasing the earth's average temperature and may, yet again, accelerate the global sea level rise, eventually inundating much of today's coastal regions. Current projections for U.S. population growth in coastal regions suggest accelerating losses of wetlands that are being destroyed by erosion, dredge and fill, eutrophication, impoundments and excessive turbidity and sedimentation. Continued deterioration of the physical condition in these regions may lead to the collapse of coastal ecosystems. Documentation and incessant monitoring is needed for the conservation and effective management of the coastal regions. With the advancement of technology, the techniques used for handling, analyzing and modeling information about the locations of phenomena and features on the earth's surface have evolved leading to faster and efficient methodologies. In the past, coastal engineers and scientists have developed sophisticated methods of generalization and abstraction to deal with problems related to mapping coastal regions. A relatively new optical remote sensing technology known as LIDAR (Light Detection and Ranging) simplifies the formerly expensive and tedious process of surface elevation surveying significantly. Other terms for LIDAR include ALSM (Airborne Laser Swath Mapping), Airborne Laser Mapping and Laser altimetry. The research presented in this thesis is an attempt at using LIDAR measurements to model the coastal terrain and the corresponding intensity measures to classify a beach zone into three classes namely dry sand, wet sand and water. The affects of atmosphere and the surface material of the target on the laser beam are discussed. Two simple classifiers, one supervised and the other unsupervised are used to segment the data into different information classes and the performance accuracies are presented.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Raghavendra K Vemula.
Thesis: Thesis (M.S.)--University of Florida, 2007.
Local: Adviser: Shrestha, Ramesh L.
Local: Co-adviser: Slatton, Kenneth C.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2009-12-31

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2007
System ID: UFE0021837:00001


This item has the following downloads:


Full Text

PAGE 1

1 APPLICATION OF LIDAR INTENSITY MEASUREMENTS FOR BEACH ZONE SEGMENTATION By RAGHAVENDRA KUMAR VEMULA A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORIDA 2007

PAGE 2

2 2007Raghavendra Kumar Vemula

PAGE 3

3 ACKNOWLEDGMENTS I would like to thank my professors: Ramesh Shrestha, Clint Slatton and William Carter for giving the freedom and at the same time providing guidance to pursue this area of research and their continued support throughout. I would like to thankMichael Starek for collaborating with me on some previous work and the valuable feedback he has provided. I would also like to thank Michael Sartori for helping in many technical aspects of LIDAR and finally I would like to thank Thelma Epperson for all the help she has offered during the course of my study.

PAGE 4

4 TABLE OF CONTENTS page ACKNOWLEDGMENTS...............................................................................................................3 LIST OF TABLES...........................................................................................................................6 LIST OF FIGURES.........................................................................................................................7 ABSTRACT.....................................................................................................................................8 CHAPTER 1 INTRODUCTION..................................................................................................................10 Background.............................................................................................................................10 Previous Research...................................................................................................................12 2 ALSM TECHNOLOGY.........................................................................................................14 Basic Principle and Technology.............................................................................................14 UFL ALSM System Configuration........................................................................................15 Intensity or Spectral Reflectance............................................................................................16 Atmospheric Interactions.................................................................................................16 Surface Material Reflectance..........................................................................................16 Atmospheric Correction..................................................................................................18 Topographic Correction...................................................................................................18 3 STUDY AREA AND DATA DESCRIPTION......................................................................23 Location..................................................................................................................................23 Data Processing......................................................................................................................23 Coordinate Conversions..................................................................................................24 Raterization......................................................................................................................25 Ground Truth Generation................................................................................................26 4 FEATURE SELECTION FOR SEGMENTATION..............................................................30 Automated Profiling...............................................................................................................30 Feature Extraction...................................................................................................................30 5 CLASSIFICATION METHODS............................................................................................38 Bayes Classifier......................................................................................................................38 K-means Clustering Algorithm...............................................................................................39

PAGE 5

5 6 CLASSIFICATION PERFORMANCE.................................................................................41 Accuracy Assessment.............................................................................................................42 Confusion Matrix....................................................................................................................42 Kappa Coefficient...................................................................................................................44 Misclassified Pixels................................................................................................................45 7 CONCLUSION.......................................................................................................................55 LIST OF REFERENCES...............................................................................................................57 BIOGRAPHICAL SKETCH.........................................................................................................59

PAGE 6

6 LIST OF TABLES Table page 2-1 UFs Standard ALSM survey parameters for St. Augustine, FL flights.................................20 2-2 MS4100 Multi-spectral camera specifications........................................................................20 4-1 Dry sand vs. Wet sand separability rankings...........................................................................36 4-2 Wet sand vs. Water separability rankings...............................................................................36 6-1 Confusion matrix for Bayes classifier.....................................................................................47 6-2 Confusion matrix for Bayes classifier after applying a majority filter (4 neighbors).............47 6-3 Confusion matrix for Bayes classifier after applying a majority filter (8 neighbors).............48 6-4 Confusion matrix for K-means classifier.................................................................................48 6-5 Confusion matrix for K-means classifier after applying a majority filter (4 neighbors).........49 6-6 Confusion matrix for K-means classifier after applying a majority filter (8 neighbors).........49

PAGE 7

7 LIST OF FIGURES Figure page 2-1 LIDAR system components.....................................................................................................21 2-2Absorption spectrum of earths atmosphere............................................................................21 2-3 Typical spectral reflectance (%) for three materials................................................................22 2-4 Changes in path length due to elevation changes....................................................................22 3-1 Location of the study area near St. Augustine beach..............................................................28 3-2 Test area polygon and manually digitized break-lines....................28 3-3 Overlay of original break-lines and lines digitized by person 2......................29 4-1 Digital elevation model and Aerial imagery of the study area................................................36 4-2 Class-conditional PDFs estimated using Parzen windowing for median intensity.................37 4-3 Example of averaged ROC curves...........................................................................................37 6-1Histograms of the three classes for training data.....................................................................50 6-2 PDFs and decision boundaries.................................................................................................50 6-3 Classification using Bayes classifier.......51 6-4 Dry sand misclassified as wet sand.........52 6-5 Wet sand misclassified as dry sand.........53 6-6 Classification using k-means classifier....54

PAGE 8

8 Abstract of Thesis Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Science APPLICATION OF LIDAR INTENSITY MEASUREMENTS FOR BEACH ZONE SEGMENTATION By Raghavendra Kumar Vemula December 2007 Chair: Ramesh Shrestha Major: Civil Engineering A coast represents one of the most important boundary zones on our planet, marking the dynamic, three way interface between land, sea and the overlying atmosphere. According to the Coastal Zone Management Act (107th Congress), over sixty percent of all Americans live within 50 miles of the Atlantic and Pacific Oceans, the Gulf of Mexico and the five Great Lakes. The population density of these areas is four times the national average, and coastal population is expected to grow by 15 percent during the next two decades. With this rise in population, there has been a tremendous increase in the competing uses of coastal resources. Moreover, sea level has risen approximately 130 m in the past 17,500 years. More abundant greenhouse gases in the atmosphere may be increasing the earths average temperature and may, yet again, accelerate the global sea level rise, eventually inundating much of todays coastal regions. Current projections for U.S. population growth in coastal regions suggest accelerating losses of wetlands that are being destroyed by erosion, dredge and fill, eutrophication, impoundments and excessive turbidity and sedimentation. Continued deterioration of the physical condition in these regions may lead to the collapse of coastal ecosystems. Documentation and incessant monitoring is needed for the conservation and effective management of the coastal regions.

PAGE 9

9 With the advancement of technology, the techniques used for handling, analyzing and modeling information about the locations of phenomena and features on the earths surface have evolved leading to faster and efficient methodologies. In the past, coastal engineers and scientists have developed sophisticated methods of generalization and abstraction to deal with problems related to mapping coastal regions.A relatively new optical remote sensing technology known as LIDAR (Light Detection and Ranging) simplifies the formerly expensive and tedious process of surface elevation surveying significantly. Other terms for LIDAR include ALSM (Airborne Laser Swath Mapping), Airborne Laser Mapping and Laser altimetry. The research presented in this thesis is an attempt at using LIDAR measurements to model the coastal terrain and the corresponding intensity measures to classify a beach zone into three classes namely dry sand, wet sand and water. The affects of atmosphere and the surface material of the target on the laser beam are discussed. Two simple classifiers, one supervised and the other unsupervised are used to segment the data into different information classes and the performance accuracies are presented.

PAGE 10

10 CHAPTER 1 INTRODUCTION Background Although a few pioneering articles relating to the application of GIS to coastal problems do occur in academic journals and other official publications, most of the early literature is restricted to government reports, conference presentations (e.g., Eberhart and Dolan, 1980) and the grey literature. Much of this early literature was primarily concerned with remote sensing of coastal waters, and focused on the acquisition of data from satellite imagery, with GIS being a largely-implied later part of the data processing operation. Interest in applying GIS to the coast first emerged in the mid 1970s, at a time when GIS itself was still a very new technology. The initial pioneering applications of GIS to the coast mostlyfocused on the ability to store and retrieve data (Bartlett, 1993a). Most only used spatial concepts such as absolute location or relative position in a limited way, and also tended to have somewhat limited graphics capabilities. In part, the early developments of coastal GIS were frequently constrained by the hardware and software then available, and also by generally low levels of awareness among researchers regarding the capabilities of a GIS. To this day, however, coastal process modeling largely remains a separate branch of computing, and most such modeling is at best only loosely coupled to GIS. Most commercial GIS software products are developed for land based applications and are built around cartographic metaphors, data models and fundamental paradigms optimized for conditions found in terrestrial environments. More often than not, these models are poorly suitable for data pertaining to coastal regions where boundaries between key variables are less easy to define, where a much greater range of spatial scales and resolutions are required, where there is a greater need to work in three spatial dimensions and where the temporal dimension is

PAGE 11

11 also fundamental to many analyses. Despite the many advances in technology, coastal zone GIS remains relatively immature with very few published models available to provide guidance. It is therefore necessary for researchers to develop their own approaches and techniques as suitable to their projects. Various techniques are available for acquisition of terrain source data such as Photogrammetry, Radargrammetry, SAR Interferometry and LIDAR. Photogrammetry is one of the popular mapping techniques widely used. It has been estimated that 85% of all topographic maps have been produced by photogrammetric techniques using aerial photographs. Developments in instruments, materials, and methods have eliminated or rescheduled fieldwork, saving much time and money. Photogrammetric mapping requires adequate ground control, both horizontal and vertical to establish correct scale, position, orientation and the level datum. Mapping quality, that is, clarity, accuracy and stability sufficient for distance measurement is the chief requisite of aerial photographs intended for aero-triangulation and basic compilation. Aerial photographs already available are often unsuitable for actual mapping for various reasons. Airborne LIDAR is emerging as an attractive alternative for capturing largescale geospatial data. Because it is an active illumination sensor, a laser system can collect data during the day or at night and can be operated at low sun angles that prohibit aerial photography. Remote areas can be surveyed easily and quickly because each XYZ point is individually geo-referenced, and aerial triangulation or ortho-rectification of data is not required (Flood and Gutelius, 1997). The surface terrain data that correspond to a beach is used by engineers and researches for various purposes such as shoreline delineation, shoreline change monitoring, volumetric analyses and others. Removal of portions of the data that are not relevant to the operation is crucial. For example, in a volume change computation between two temporally spaced datasets, if the pixels

PAGE 12

12 that represent water (if any) are not removed, the resulting statistics would be erroneous. Also, segmentation of a beach zone aids in high resolution morphology studies.Traditionally, manual digitization is used to delineate pixels that represent water. In this analysis, ALSM data acquired by the GEM (Geosensing Engineering and Mapping) Research Center at the University of Florida is used to analyze the relevance of intensity as a classification feature in a coastal zone. The research presented here is an extension to the work done by Starek et al (2007). Starek et al (2007) used intensityas a feature for image classification. Eight intensity-based features were mined from the intensity profiles extracted from ALSM data collected along a beach and partitioned into three classes to detect the water line. Class-conditional probability density functions were estimated for each feature and their inter-class separation ranked. Results indicated that ALSM intensity measures do provide useful information for image classification and noted that mean intensity was one of the highest ranking features, being most informative. As an addition to that work, intensity is used as a parameter for two classifiers; one a simple Bayes (supervised) classifier and the other an iterative k-means clustering method (unsupervised) to classify the pixels in an intensity raster. To assess the classification accuracy, a confusion (or error) matrix tool is used.Previous Research Since the development of the LIDAR methodology, researchers have used LIDAR measurements as a feature for various classification problems.Many of themethodologies can be traced back to the extensive research that has been done on extracting features from data acquired using photogrammetric mapping techniques. However, a majority of research on ALSM data has concentrated on areas such as building extraction and land cover classification. Song et al (2002) assessed the possibility of land cover classification using LIDAR intensity data. They assessed the separability of intensity data on some classes including asphalt

PAGE 13

13 road, grass, house roofs, and trees. Theyfound that intensity does follow relative magnitudes of reflectance and concluded that LIDAR intensity data could be used for land-cover classification. Lutz et al (2003) investigated the use of laser intensity on glacial surfaces and determined that it can be effectively employed to identify surface characteristics and surface classes of glacial regions. Hu et al (2004) demonstrated the potential and power of using LIDAR data to extract information from complicated image scenes by developing an automatic road extraction method. Luzum et al (2004) provided a systematic method for ranking features to be used for classification and found highly effective classifications of buildings and trees from ALSM features derived from intensity.

PAGE 14

14 CHAPTER 2 ALSM TECHNOLOGY The use of laser as a remote sensing instrument has an established history that can be traced back to 30 years. During the 1960s and 1970s various experiments demonstrated the power of using lasers in remote sensing including lunar laser ranging, satellite laser ranging, atmospheric monitoring and oceanographic studies (Flood 2001). The advancement in technology in terms of reliability and resolution over the past decades contributed to the wide acceptance of LIDAR as a significant tool for remote sensing, surveying and mapping problems. The effectiveness of LIDAR has been demonstrated by a number of applications where traditional photogrammetric methods have failed or proven to be expensive, for example, the acquisition of three dimensional urban data, areas with dense vegetation or the surveying of power lines. Basic Principle and Technology ALSM is a complex integrated system consisting of a laser range finder, a computer system to control the on-line data acquisition, a storage medium, a laser scanner, a kinematic GPS (Global Positioning System) receiver and an IMU (Inertial Measurement Unit) system for determining the orientation and position of the system. The basic scanning principle is illustrated in Figure 2.1. Since LIDAR is an active sensing system, it sends off electromagnetic energy and records the energy scattered back from the terrain surface and objects on the terrain surface. The operational wavelength of many LIDAR systems is just above the visual range of the electromagnetic spectrum, about 1064 nm. The intensity of the returning signals is dependant on the surface material of the target of the signal. A receiver records the return pulse. With the knowledge of the speed of light and the time the signal takes to travel from the aircraft to the object and back to the aircraft, the distance can be computed. With the help of a rotating mirror

PAGE 15

15 inside the laser transmitter, the laser pulses can be made to sweep through an angle and by reversing the direction of rotation at a selected angular interval, the laser pulses can be made to scan back and forth along a line. When such a laser ranging system is mounted in an aircraft with the scan line perpendicular to the direction of flight, it produces a saw tooth pattern along the flight path.The width of the strip or "swath" and the spacing between the points measured depends on the laser pulse rate, the scan angle of the laser ranging system and the aircraft height. Errors in the location and orientation of the aircraft, the beam director angle, atmospheric refraction model and several other sources degrade the co-ordinates of the surface point typically to 5 to 10 centimeters (Shrestha and Carter, 1998). UFL ALSM System Configuration The data used in this research were collected by the University of Floridas Optech ALTM 1233 system whichoperates at a frequency of 33 kHz. Table 2.1presents the standard ALSM survey parameters used by the GEM center at the University of Florida. The scan angle is usually set to 20 o and the aircraft is flown at ~600m to ~1000m above the ground level. Ashtech Z-12 receivers were used on board the aircraft to gather GPS data and with a Litton LN200A inertial measurement unit(IMU) the attitude of the aircraft was recorded. Various software such as Realm 3.x, PosPac 4.x and kinematic and rapid static (KARS) are used at different steps of processing the raw data to generate point cloud data which is further used to produce digital elevation models and intensity images representing the area of interest. A CCD (Charge-Coupled Device) camera, DuncanTech MS4100, was used to acquire aerial photography.The camera can be configured to acquire RGB (Red, Green and Blue) images or CIR (Color Infrared) images or RGB images along with monochrome IR images. It can be triggered externally through a pulse and has three different operating modes for trigger input.The specifications of the camera are presented in Table 2.2.

PAGE 16

16 Intensity or Spectral Reflectance The intensity value is a measure of the return signal strength. It measures the peak amplitude of return pulses as they are reflected back from the target to the detector of the LIDAR system. Intensity values are relative rather than absolute and vary with altitude, atmospheric conditions, directional reflectance properties, and the reflectivity of the target. Because these values are relative, the process of creating images from vector intensity data requires the exercise of judgment. The two main mechanisms that alter the intensity and direction of electromagnetic radiation within the atmosphere are absorption and scattering. Though these mechanisms affect all wavebands in the optical spectrum by a varying degree, these effects are variable both in space and time. Atmospheric Interactions Electromagnetic radiation interacts with the earths atmosphere depending on the wavelength of the radiation and the local characteristics of the atmosphere. Scattering is more likely to occur at shorter wavelengths. The most common scattering behavior is known as Raleigh scattering which is the main source of haze in remotely sensed imagery. The atmosphere has different levels of absorption at different wavelengths. Regions of the spectrum with relatively high transmission are called atmospheric windows (see Figure 2.2). The energy in some wavebands is almost completely absorbed by the atmosphere.Surface Material Reflectance The incident energy upon a target can be separated into three portions, namely, energy that is transmitted, absorbed and reflected. Surface material reflectance characteristics may be quantified by the spectral reflectance, which is a percentage measure obtained by dividing reflected energy in a given waveband by the incident energy. The magnitude of the reflected energy primarily depends on three factors: the magnitude of the incident energy, the roughness

PAGE 17

17 of the material, and the material type. Normally, the first two factors are regarded as constants. Therefore, only the third factor, material type is considered. Surface roughness is a wavelength dependent phenomenon. For a given material, the longer the wavelength the smoother the material appears. For a specular surface, reflected energy travels only in one direction such that the reflection angle is same as the incidence angle. In the case of a lambertian surface, incident energy is reflected equally in all directions. However, in practical applications, most surface materials act neither as specular nor lambertian reflectors, their roughness lies somewhere between the two extremes. Figure 2.3 shows the average reflectance over the optical region of the spectrum for three ideal surface materials: dry bare soil, clear water and green vegetation. The graph shows how these surface materials can be separated in terms of their reflectance spectra. It is apparent that vegetation reflectance varies considerably across the wavebands. The highest reflectance values occur around the near-infrared and part of the mid-infrared bands, while the lowest reflectance values occur at about 0.4 m. Knowledge of surface material reflectance characteristics provides a principle on the basis of which suitable wavelengths for any particular mission to scan the earths surface can be selected. Such knowledge also provides an important basis to make the objects more distinguishable in terms of multiband images manipulations such as overlay or subtraction. It can also be noticed in Figure 2.3 that the reflectance from water surfaces is relatively low, and is nearly zero at wavelengths beyond the visible red. The operational wavelength of the LIDAR system used to gather data for this research is approximately 1.064 m. Hence it can be fairly assumed that in a coastal zone, the reflectance from the waves is supposed to be minimal. The

PAGE 18

18 wet sand is supposed to have lower reflectance as opposed to dry sand due to its higher absorption properties as a result of the H2O molecules present in wet sand.Atmospheric Correction Electromagnetic energy detected by sensors consists of a mixture of energy reflected from or emitted by the ground surface and energy that has been scattered within or emitted by the atmosphere. The magnitude of the electromagnetic energy in the visible and near-infrared region of the spectrum that is detected by a sensor above the atmosphere is dependent on the magnitude of incoming solar energy (irradiance), which is attenuated by the process of atmospheric absorption, and by the reflectance characteristics of the ground surface. Hence, energy received by the sensor is a function of incident energy (irradiance), target reflectance, atmospherically scattered energy (path radiance), and atmospheric absorption. Interpretation and analysis of remotely sensed images in the optical region of the spectrum is based on the assumption that the values associated with the image pixels accurately represent the spatial distribution of ground surface reflectance, and that the magnitude of such reflectance is related to the physical, chemical, and biological properties of the ground surface. Hence, the response of these detectors to a uniform input tends to change over time. The necessity for atmospheric correction depends on the objectives of the analysis. Correction for these effects is vital if the analysis of a given area is done over time, for example over multiple seasons. However, for the current analysis, the assumption that all the pixels in the study area are equally affected by atmospheric processes and the fact that the pixels are being compared with other pixels within the same image makes it nonessential for the application of atmospheric corrections. Topographic Correction Normally, the surface being measured by the remote sensor is assumed to be flat with a lambertian reflectance behavior. Under this assumption, the magnitude of the radiance detected

PAGE 19

19 by the sensor is affected only by variations in the zenith angle, the wavelength, and the atmospheric interaction. However, this may become invalid in the case of a rugged terrain because the incidence angle will vary with topographic variation and will further contribute to unwanted weakening or strengthening of the level of radiance detected by the sensor. More specifically, the topographic effect can be defined as the variation in radiance exhibited by inclined surfaces compared with radiance from a horizontal surface as a function of orientation of the surface relative to the radiation source. Moreover, if a non-lambertian surface is being assumed for the surface being measured, the sensor position is another important variable that may be considered. Factors that influence path length, and consequently the radiance detected, include changes in ground topography, variations in aircraft vertical position, and variations in scan angles. Also the surface topographic variations will cause distortions in the geometric correction of images. For example, the map to which an image is referenced represents the relationship between features reduced to some datum such as sea level while the image shows the actualterrain surface. If the terrain surface is significantly above the sea level, then the image pixel position will be displayed by an amount proportional to the pixels elevation above the sea level or the corresponding datum used. Figure 2.4 illustrates how elevation changes effects path length. A discussion on how the data used in this analysis is corrected for these effects is presented in the Data Processing section of chapter 3.

PAGE 20

20 Table 2-1.UFs Standard ALSM survey parameters for St. Augustine, FL flights Flying Speed (m/s) Scan Spacing (m) Pulse Rate (p/sec) ~60 ~2 33333.0 Indicated Air Speed (nm/h) Scan Width (m) Pulses Per Scan ~135 ~436 595.2 Scan Rate (Hz) Scan Angle (+/-deg) Distance Between Points Along Scan(m) 28.0 20.0 ~0.73 Flying Height (m) Flight line Spacing (m) Swath Overlap (m) ~600 ~90 345 Table 2-2.MS4100 Multi-spectral camera specifications Sensor 3 CCD with a colour separating prism Pixel size 0.0074 mm x 0.0074 mm Image resolution 1924 x 1075 Focal length 28 mm Frame rate 10 frames per second Pixel clock rate 25 Mhz Signal/Noise 60 dB Digital image output 8 bits x 4 taps or 10 bits x 3 taps Programmable functions Gain, exposure time, multiplexing, trigger modes, custom processing Electronic shutter Range: 1/10,000 1/10 sec.,controlled via RS-232 input Operating Temperature 0 50 C

PAGE 21

21 Figure 2-1. LIDAR system components Figure2-2. Absorption spectrum of earths atmosphere. The curve indicates the total atmospheric transmission.

PAGE 22

22 Figure2-3. Typical spectral reflectance (%) for three materials: green vegetation, bare soil and water Figure 2-4. Changes in path length due to elevation changes

PAGE 23

23 CHAPTER 3 STUDY AREA AND DATA DESCRIPTION Location The University of Floridas GEM Research Center acquired ALSM data along the St. Augustine beach region of Florida in the month of February of the year 2007. The surveys were conducted at approximately low tide using the University of Floridas Optech ALTM 1233 airborne laser scanning system. At a flying height of 600 m, the point density is under 1 point/m2. Aerial imagery concurrent in time with the ALSM point data is acquired to enable ground truth labeling to aid in the segmentation of the intensity images into various beach classes for training. An area approximately 2500 m in length just north of the pier region at the St.Augustine beach is chosen for segmentation purposes. This region near the St.Augustine beach was selected for this analysis since it contains the highly erosive pier region, which is of significant commercial and recreational importance for tourism and the focus of continual beach nourishment efforts.The 2500m strip of the beach is chosen for segmentation because clear high resolution aerial imagery was available for this portion of the beach. Figure 3.1 shows the location of the study area. Data Processing Different LIDAR systems record different combinations of observables depending on the system configuration, and the project requirements. Because there may be more than one return in areas covered with vegetation which allow a look at the undergrowth, data may be classified as first, second, third or last returns, and bare-earth. Many above-ground features may be classified separately. The first and the last returns are used most often. First returns are the first elevation value that the LIDAR sensor records for a given x,y coordinate. They represent the highest features such as the tree canopy, buildings etc. Likewise, last returns are the last

PAGE 24

24 elevation values recorded by the sensor for a given x,y coordinate.The last return is generally assumed to be ground level except for cases where the above-ground objects are opaque. The data available for this analysis contains the first and last returns including the corresponding intensity values of the returned pulse. Since the energy transmitted by the sensor is constant, the intensity of the return signal is primarily dependant on factors like target reflectance, path radiance, and atmospheric absorption as discussed in chapter 2. Also, since the data used in this analysis is a single-date acquisition and the altitudes and distances covered are on the lower side, the impact of the atmospheric conditions is considered to be constant. However, the return signal intensity is corrected for topographic effects although the path length changes may not be dramatic since the variations in the ground topography near a beach are usually minimal. These effects are also significant when a dataset consists of values from different scans which are notthe case with the data used for this research. The raw ALSM intensities were normalized using a UF-developed program [GEM_ Rep_ 2004_07_001] to correct for intensity variations due to changes in path length. Though no canopy or building structures are present near a beach zone,the data may still contain non-ground points due to people on the beach or any other random objects. It is therefore important to eliminate those points by filtering the data before producing a bare earth model. A commercial software package called Terrascan that can be executed in the popular CAD environment Microstation is used for this purpose. Coordinate Conversions After the filtering process, the data consists of only those points that are representative of the ground. The heights are initially referenced to the NAD83 (North American Datum 1983) horizontal datum and to the GRS 80 (Geodetic Reference System 1980) ellipsoid in the vertical direction. These are then processed with respect to the NAD83 datum and projected to Universal

PAGE 25

25 Transverse Mercator, Zone 17, meters. The ellipsoid heights were transformed with respect to the NAVD88 (North American Vertical Datum 1988) vertical datum using the GEOID03 (NOAA/NGS) geoid model. These transformations are done so that the elevations could then be referenced to local tidal datums for any additional local computations. Raterization The point data (vector format) is converted to raster format because of its ability to represent continuous surfaces and to perform advanced spatial and statistical analysis on the surface. In its simplest form, a raster consists of a matrix of cells organized into rows and columns (or a grid) where each cell contains a value representing information. In raster datasets, each cell (which is also known as a pixel) has a value. The cell values represent the phenomenon portrayed by the raster dataset such as a category, magnitude, height, or spectral value. The dimension of the cells can be as large or as small as needed to represent the surface conveyed by the raster dataset and the features within the surface, such as a square kilometer, square foot, or even a square centimeter. The cell size determines how coarse or fine the patterns or features in the raster will appear. The smaller the cell size, the more detailed the raster will be. However, the greater the number of cells, the longer it will take to process and it will increase the demand for storage space. If a cell size is too large, information may be lost or subtle patterns may be obscured. For example, if the cell size is larger than the width of a road, the road may not exist within the raster dataset. The location of each cell is defined by the row or column where it is located within the raster matrix. Essentially, the matrix is represented by a Cartesian coordinate system, in which the rows of the matrix are parallel to the x-axis and the columns to the y-axis of the Cartesian plane. Row and column values begin with 0. Once the data was filtered and transformed, they were gridded with a cell size of 1 m using the commercial software package Surfer so that the cell value represents intensity. Ordinary

PAGE 26

26 Kriging was used as the interpolation method. Kriging is based on the regionalized variable theory that assumes that the spatial variation in the phenomenon represented by the z-values is statistically homogeneous throughout the surface (for example, the same pattern of variation can be observed at all locations on the surface). Each grid was formed using a linear variogram model and a nugget effect with error variance set to 0.07 m. This error represents the uncertainty in UFs ALSM system, which is approximately equal to 7 cm vertical. Ground Truth Generation Ground truth refers to measurements of an observed quantity that can be used to calibrate quantitative observations or to validate new measurements. It is carried out to verify the correctness of information by use of ancillary information that is collected at the site. The ground truth data enables the interpretation and analysis of what is being sensed. In the present case of a classification problem, this step demands significant importance because the ground truth data is used as a reference against which the performance of a classifier is gauged. The accuracy of a supervised or an unsupervised classifier is determined by comparing the labels of the pixels assigned to them by a classifier with the true label of the same pixels as per the ground truth.For the research presented here, the aerial photographs acquired simultaneously at the time of the LIDAR data acquisition are used to generate the ground truth data. The elevation of the aircraft at the time of the data acquisition averaged at about 550 m. At this height, the camera was able to produce images with a pixel size of 15 cm. The images were recorded in three channels (red, blue and green) enabling the production of RGB color composite images. The images were then geo-referenced and geometrically corrected such that the scale of the photographs is uniform and to remove distortions caused by the camera optics, camera tilt and variations in the surface elevation. The ortho-rectification software, Terraphoto was used to geo-reference the images by using ground laser points as the projection surface. The software

PAGE 27

27 uses the position and orientation information of the aircraft from the GPS and the IMU sensors for these purposes. The accuracy of the images was analyzed with respect to the LIDAR data with the help of easily distinguishable features such as road markings or building edges as control points on images and registering them to corresponding points on the LIDAR digital surface model (DSM). The accuracy of the images was found to be about 3 meters. A great degree of attention is required while generating the ground truth data because the assessment of the performance of the classifiers is directly dependant on this reference data. For the current analysis, the rectified aerial images in conjunction with the intensity grids are used to manually digitize the lines that delineate the dry sand, wet sand and the wet sand, water portions of the beach. The aerial images are overlaid with the gray scale intensity grids to aid in the visual examination of the area. Figure 3.2 shows the test area polygon with the manually digitized break-lines. To check the repeatability of the process of this manual digitization, a 200m segment of the test data is chosen and the break-lines are again digitized manually by a different person. This is done to check the validity of the original ground truth data. If the variance between the results produced by two different persons is low, then the original ground truth data is considered to be valid. Otherwise, the regeneration of the ground truth with more careful measures is highly recommended. Figure 3.3 shows an overlay of the original break-lines and the portions of the break-lines digitized by a different person. The mean and standard deviation between the two cases were found to be approximately 0.2m and 1.3m for the dry-wet line and 1.6m and 3.3m for the wet-water line which for this studywere considered to be small.As mentioned earlier, any errors in the ground truth has a direct impact on the final accuracy measures. When the ground truth is slightly changed with a mean of about two meters, for a small portion of the test area, the final accuracy values have found to be deviating under a percent.

PAGE 28

28 Figure 3-1. Location of the study area near St. Augustine beach Figure 3-2. Test area polygon and manually digitized break-lines This line delineates the dry and wet portions of the beach. It is digitized manually. This line delineates the wet and water portions of the beach. It is digitized manually.

PAGE 29

29 Figure 3-3.Overlay of original break-lines and lines digitized by person 2

PAGE 30

30 CHAPTER 4 FEATURE SELECTION FOR SEGMENTATION Automated Profiling A computer routine was developed to mine the image by extracting intensity values along cross-shore profile lines oriented roughly orthogonal to the shoreline. This provides the x,ycoordinate and intensity value along each profile in 1m increments. These profile lines are extracted every 2m in the along-shoreline (parallel) direction and extend in the cross-shore (perpendicular) from the dune line to 10m past the water line. The aerial imagery, which was geo-referenced to the same coordinate frame as the ALSM imagery, was used to segment each profile line into three classes: dry beach, wet beach, and water. The dry beach is that portion of the beach that does not directly interact with the incoming waves. In contrast, the wet beach is in direct contact with the incoming waves due to the continuous wave action.All three classes are visible in Figure 4.1. Feature Extraction Once the profiles were extracted and classified into dry, wet, and water, several features based on intensity were then computed for each profile line on a per class basis. Take for example a hypothetical feature, 1 f Then for Profile 1 l values for 1 f are computed separately for the profile points in each class i C for 1,2,3 i Since all three classes are present in all profiles, we obtain 1 1 m flC for feature 1 and class 1, where m indexes the profile number. Results are accumulated over all features, profiles, and classes. This approach allows us to examine interclass separation and class-conditional probabilities j i pfC for 1,2,3 i and j indexing the feature set.

PAGE 31

31 The following features are mined from the intensity profiles: minimum intensity, maximum intensity, median intensity, mean intensity, standard deviation of intensity, mean curvature, mean gradient, intensity slope. Most of the aforementioned features are self-explanatory, but some require further detail. Mean gradient is based on computing a local gradient every 2 meters along the intensity profile, which is smoothed using a moving average filter of window size 5, and then taking the mean of those gradient values. The gradient was computed using a centralized-difference approximation. The mean curvature was computed as the gradient of the gradient values and then taking the mean value for that profile. The intensity slope was computed using a regression line through the profile points. There are other feature measures that can be derived from intensity data such as texture measures. However, many of these measures are based on image processing techniques that require moving windows. As explained previously, features are extracted from the values along a profile line as opposed to using a moving 2D window or simple data clustering. This approach was selected to remain consistent with the standard practices used in coastal erosion studies, which utilize cross-shore and along-shore measurements as the most natural coordinate frame. Traditional methods are generally based on manually surveyed cross-shore profiles spaced at equal intervals. The major difference is that ALSM typically provides three to four orders of magnitude increase in profile sampling density compared to manual surveying methods. Once the features are computed for each class, their class-conditional probability density functions (PDFs) are used to assess the PDF separability using divergence measures. The features that correspond to the most separable PDFs are more likely to yield robust classification. Figure 4.2 shows class-conditional histograms for the median intensity feature. As observed in

PAGE 32

32 these plots, the data in feature space are multi-modal and non-Gaussian in appearance, as is often the case with ALSM data. The non-parametric Parzen windowing method was selected to estimate the PDFs from the data because a parametric form cannot be assumed. The Parzen window method uses a d-dimensional histogram to estimate the probability density ) ( x p n within a hypercube (Duda, Hart and Stork, 2001) as ) ( 1 1 ) ( 1 n i n i d n n h x x h n x p (4 1) where n is number of points for feature x and h n is the length of the edge of the hypercube volume. d=1 for this analysis since we extract 1D profiles. is the window function and was chosen to be a univariate Gaussian window because the features can be spatially correlated (Duda, Hart and Stork, 2001): 2/2 2 1 ) ( u e u (4 2) Crucial to Parzen estimation is the selection of the edge length, h n If h n is too large it over smoothes the data and if it is too small approaches a delta function and becomes too spiky. A reasonable value for h n was determined empirically to be 1/30th of the range of values for each feature. Generally, the more separation between classes for a given feature, the more probable that feature will lead to successful classification. To assess inter-class separation across the estimated class-conditional PDFs two performance metrics based on relative entropy, i.e. the KullbackLeibler divergence (Dkl), are used: Jensen-Shannon divergence (JSD) and a normalized form of JSD. JSD based on Dkl has the following form (Tsai, Tzeng and Wu, 2005): 1 (,) 2 2 2 kl kl PQ PQ JSDPQ D P D Q (4 3)

PAGE 33

33 where P and Q are the class-conditional PDFs for feature I refer to (Duda, Hart and Stork, 2001). JSD is a symmetric form of Dkl and is non-negative. By taking the square root of the JSD it satisfies the triangle inequality and all other properties of a metric (Tsai, Tzeng and Wu, 2005). The square root form of JSD is often used for feature selection in classification problems, where P and Q are the conditional PDFs of a feature under two different classes. Therefore, JSD referred to hereafter simply as JSD, was selected as a measure to assess feature separation. When assessing which features provide more divergence between classes using entropybased measures, such as Dkl, one must be cautious. Some features may have larger inherent entropies and this can pose problems when comparing across different feature spaces. The standard Dkl definition of the divergence is biased towards large entropies (Guha and Sarkar, 2006). Hence, we use a form of JSD that is normalized by the entropy (measure of randomness) of P and Q: 1 2 2 2 (,) 2() 2() kl kl PQ PQ D P D Q NJSDPQ HP HQ (4 4) where H is entropy. This metric is labeled as normalized-JSD (NJSD) and is used as a second metric. The classification performance of each feature was tested using a two-class nave Bayes classifier and k-folds cross-validation, which is a method that randomly divides the data into a test and training set k times and classifies each k times. This provides a means to test generalization capabilities for each feature with the limited training data. Receiver operating characteristic (ROC) curves were generated for each feature for each of the k classifications. ROC curves are 2-D graphs that depict classifier performance in which the

PAGE 34

34 true positive rate is plotted on the Y axis and false positive rate is plotted on the X axis. Here positive refers to Class 1 and negative Class 2 (e.g. wet vs. dry class). By varying a threshold, ROC curves for the classifier can be generated. Figure 4.3 displays example ROC curves for three features. The diagonal line represents random guessing. An important property of ROC curves is that theyre insensitive to changes in prior class distributions. Refer to (Fawcett, 2006) for more on ROC curves. To evaluate classifier performance between features the area under the curve (AUC) was calculated for each of the k ROC curves for each feature using trapezoidal integration and then averaged. AUC is a scalar measure and varies between 0 for no classification success, 0.5 for random guessing, to 1 for perfect classification results. AUC is equivalent to the probability that the classifier will rank a randomly chosen positive class instance higher than a randomly chosen negative class instance. This is equivalent to the Wilcoxon test of ranks (Fawcett, 2006). Thus, we expect those features determined most separable to provide the best classification performance and have an AUC close to 1. In this regard, the AUC acts as a measure of feature separation. Table 4.1 presents a ranking of separation between wet and dry beach class for each feature and their classifier results using mean AUC. 1 is most separable, 8 least separable.The high ranking of mean and median is expected since dry sand tends to reflect more than wet sand at the systems operating wavelength of 1064 nm. Mean gradient and curvature provide the least separation as these values are texture measures. One interesting aspect is the subtle difference between JSD and NJSD rankings such as for max and mean. These differences can be viewed as the max feature having more inherent entropy compared to the mean feature. By normalizing, the uncertainty is lessened thereby resulting in NJSD indicating more separation.

PAGE 35

35 As expected the most separable centroidal features provide the best classification compared to texture measures as observed in the mean AUC. Good classification is achieved for some features due to very distinct changes in surface reflectance transgressing from dry to wet beach. The rankings for wet beach vs. water are presented in Table 4.2, where Mean and Median are again highly ranked with Median having the highest mean AUC.

PAGE 36

36 Table 4-1. Dry sand vs. Wet sand separability rankings Table 4-2.Wet sand vs. Water separability rankings Figure 4-1. Digital elevation model and Aerial imagery of the study area

PAGE 37

37 Figure 4-2. Class-conditional PDFs estimated using Parzen windowing for median intensity. Notice the non-Gaussian, multi-modal nature of data and overlapping regions resulting in nonzero Bayes classification error. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Averaged ROC Plot (Wet is + class, Water is class)False Positive Rate T r u e P o s i t i v e R a t e slope Median Mean Intensity Figure 4-3. Example of averaged ROC curves. The dashed line represents classifier performance of random guessing.

PAGE 38

38 CHAPTER 5 CLASSIFICATION METHODS Classification is an abstract representation of the field situation using well defined diagnostic criteria. A classification describes the systematic framework with the names of the classes and the criteria used to distinguish them and the relation between classes. Classification thus necessarily involves definition of class boundaries that should be clear, precise, and based upon objective criteria. A classification should therefore be: Scale independent, meaning that the classes at all levels of the system should be applicable at any scale or level of detail, and Source independent, implying that it is independent of the means used to collect information, whether satellite imagery, aerial photography, field survey or some combination of them is used. Bayes Classifier Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong statistical independence assumptions. Despite its simplicity, Naive Bayes can often outperform more sophisticated classification methods. Depending on the precise nature of the probability model, naive Bayes classifiers can be trained very efficiently in a supervised learning setting. In many practical applications, parameter estimation for naive Bayes models uses the method of maximum likelihood.An advantage of the naive Bayes classifier is that it requires a small amount of training data to estimate the parameters (means and variances of the variables) necessary for classification. Because independent variables are assumed, only the variances of the variables for each class need to be determined and not the entire covariance matrix. Abstractly, the probability model for a classifier is a conditional model (5 1) over a dependent class variable C with a small number of outcomes or classes, conditional on several feature variables F 1 through F n If the number of features n is large or when a feature

PAGE 39

39 can take on a large number of values, then basing such a model on probability tables is infeasible. We therefore reformulate the model to make it more tractable. Using Bayes' theorem, we write (5 2) In plain English the above equation can be written as (5 3) K-means Clustering Algorithm The k-means algorithm is an algorithm to cluster objects based on attributes into k partitions. It is similar to the expectation-maximization algorithm for mixtures of Gaussians in that they both attempt to find the centers of natural clusters in the data. It assumes that the object attributes form a vector space. The objective it tries to achieve is to minimize total intra-cluster variance, or, the squared error function. (5 4) where there are k clusters S i i = 1,2,..., k i is the centroid or mean point of all the points The most common form of the algorithm uses an iterative refinement heuristic known as Lloyd's algorithm (a method for evenly distributing samples or objects, usually points). Lloyd's algorithm starts by partitioning the input points into k initial sets, either at random or using some heuristic data. It then calculates the mean point, or centroid, of each set. It constructs a new

PAGE 40

40 partition by associating each point with the closest centroid. Then the centroids are recalculated for the new clusters, and the algorithm is repeated by alternate application of these two steps until convergence, which is obtained when the points no longer switch clusters (or alternatively centroids are no longer changed). Lloyd's algorithm and k-means are often used synonymously, but in reality Lloyd's algorithm is a heuristic for solving the k-means problem. Unfortunately, with certain combinations of starting points and centroids, Lloyd's algorithm can in fact converge to an incorrect value. In terms of performance the algorithm is not guaranteed to return a global optimum. The quality of the final solution depends largely on the initial set of clusters, and may, in practice, be much poorer than the global optimum. Since the algorithm is extremely fast, a common method is to run the algorithm several times and return the best clustering found.

PAGE 41

41 CHAPTER 6 CLASSIFICATIONPERFORMANCE For the purpose of segmenting computations, the three separable classes namely dry sand; wet sand and water are assigned labels 1, 2 and 3 respectively. In this analysis, an attempt to achieve a classification is made using the two methods (Bayes and k-means) discussed in chapter 5. The popular numerical computing environment, MATLAB was used to develop code used to implement these methods. The Bayes classifier is used as a supervised method to establish a relationship between a pattern and a class label. The relationship between the object and the class label in this case is one-to-many producing a fuzzy classification. To determine the decision boundaries in the feature space and train the classifier,certain assumption are made. However, if the assumptions are not congruent with the data in reality, the resulting values may result in misleading and erroneous results. Hence, histograms of a 500m portion of the whole study area which amounts to about 20% of the data representing all the three classes are plotted to allow visual examination of the data to determine if it can be assumed to be Gaussian. For the data used, it was found from the nature of its display, that a Gaussian assumption can be made. However, if worked over larger areas with more training data, using probability density functions computed using Parzen windowing is recommended and may lead to a better fit. But in this case we chose a Gaussian fit since the study area is small.Figure 6.1 shows the histograms of the training data used to examine the nature of the data. In certain classification experiments, information classes often overlap. In the spectral domain, this implies that the reflectance, emittance, or back-scatter characteristics of different classes may be similar. In the spatial domain, the implication is that any one object may contain areas representative of more than one information class. This is the mixed pixel problem. Spatial overlap of classes is the main barrier to the achievement of high

PAGE 42

42 classification accuracy. To address this concern to some extent, some post processing on the classified results is conducted. Figure 6.2 presents a graph of the probability density functions for the three classes pertaining to the training data and the decision boundaries. The pdfs were estimated by calculating the mean and variance for each information class. The boundary of separation between the water and wet sand classes was found to be 42.23 (rounded off to 42 for computation) and between the wet sand and dry sand classes as 93.5 (rounded to 94). A majority filter is used as a generalization tool to identify small isolated areas that have been misclassified and more reliable values are assigned to the pixels that make up the areas. It replaces cells in a grid based on the majority of their contiguous neighboring cells. Two kernels of sizes four and eight are used. The K-means clustering method is used as anunsupervised method. The number of clusters (3 for this analysis) that the data will be partitioned into is provided as an input to the classifier. Accuracy Assessment No classification is complete until its accuracy has been assessed. To estimate the classification accuracy, the level of agreement between the labels assigned by the classifier and the class allocations based on the ground data collected is assessed. That is the labels given using the ground truth imagery to the 500m portion of the data that is treated as the training data are compared with the corresponding labels assigned using each of the classifiers tested. Confusion Matrix A confusion (or error) matrix tool is used for the classification accuracy assessment. A confusion matrix is a square array of dimension n x n, where n is the number of classes. Each column of the matrix represents the instances in a predicted class, while each row represents the instances in an actual class. Several indices of classification accuracy can be derived from the

PAGE 43

43 confusion matrix. The overall accuracy is obtained by dividing the sum of main diagonal entries of the confusion matrix by the total number of pixels. This overall view treats the classes as a whole and does not provide specific information about the accuracy of each individual class, nor does it relate to the spatial pattern of errors. In order to assess the accuracy of each information class separately, the producerss accuracy and consumers accuracy are computed. The producerss accuracy measures the proportion of pixels that are correctly identified by the classifier. Producers accuracy is a measure of omission error. Consumers accuracy measures the proportion of pixels identified by the classifier as belonging to class i that agree with the test data. Table 6.1 presents the confusion matrix for the three information classes classified using the Bayes classifier. The numbers of sample pixels for each class in the training data set are 43437, 39967 and 23438 respectively. The values in the first row of Table 6.1 indicate that of the total 43437 pixels, 42662 are classified as dry sand, 775 are classified as wet sand and no pixels were classified as water. The main diagonal entries of the matrix represent the number of pixels that have the same classificationas per the training data and the classifier and these are the number of pixels that are considered to be correctly classified. It can be observed from the tables that the performance of the classifiers is high while distinguishing between dry sand and other information classes. Table 6.2and table 6.3 present the matrices for the Bayes classifier after applying a majority filter on the classified output. Table 6.2 corresponds to a four neighbor majority filter where as Table 6.3 corresponds to an eight neighbor filter. Table 6.4, Table 6.5 and Table 6.6 present corresponding results using the k-means classifier. It can be noted from the tables that the accuracy values for both the classifiers are similar. Both the supervised and the unsupervised methods in this case have performed well and yielded a good classification. Also,

PAGE 44

44 there is no significant change in the accuracy values after the majority filter has been applied. Usually the impact of a majority filter would be more apparent if there are many small, isolated single or groups of cells throughout the image. The overall classification accuracy for the Bayes classifier (refer to Table 6.1) is calculated as (42662 + 36848 + 22080) / 106842 = 101590/106842 = 95.08 % The producers accuracy for each information class (refer to Table 6.1) is calculated as Dry sand: 42662 / 43437 = 98.22 % Wet sand: 36848 / 39967 = 92.19 % Water: 22080 / 23438 = 94.20 % The consumers accuracy for each information class (refer to Table 6.1) is calculated as Dry sand: 42662 / 44410 = 96.06 % Wet sand: 36848 / 38975 = 94.54 % Water: 22080 / 23457 = 94.12 % However, these accuracy measurements are based on either the principal diagonal or the rows of confusion matrix only which does not use the information from the whole confusion matrix. Hence a multivariate index called the kappa coefficient is used. Kappa Coefficient The kappa coefficient uses all of the information in the confusion matrix in order that the chance allocation of labels can be taken into consideration.The kappa coefficient( ) is defined by:

PAGE 45

45 (6 1) where r is the number of columns (and rows) in a confusion matrix, x ii is entry (i,i ) of the confusion matrix, x i+ and x +i are the marginal totals of row i and column i respectively and N is the total number of observations. The kappa coefficient takes not just the principal diagonal entries, but also the off-diagonal entries into consideration. The higher the value of kappa, the better the classification performance. If all the information classes are correctly identified, kappa takes a value of 1. As the values of the off-diagonal entries increase, so the value of kappa decreases. Since the interpretation of the kappa statistic is based on the assumption of a multi-normal sampling model, if the test data are not chosen properly, the assessments may become less reliable. Misclassified Pixels In the classification process, a single or a small group of cells may be misclassified as an entity different from the sea of cells surrounding it. The spatial distribution of misclassified points may sometimes provide a good insight about the features at that location. Figure 6.3 shows an overlay of the ground truth break-lines (dry sand wet sand line and wet sand water line) with the classification results using Bayes classifier. Small zones of red pixels can be noticed in the green pixels regions. The classifier has incorrectly classified those pixels as dry sand while in reality those entities belong to the group of pixels that surround it. Those small regions could represent the small sand dunes found in a beach that have a relatively higher elevation than the surrounding pixels. Often times, these small dunes are not encapsulated by the waves resulting in them remaining dry and therefore exhibit more reflectance.The distributions

PAGE 46

46 of some interesting misclassifications are presented in Figure 6.4 and Figure 6.5. Figure 6.4 shows the pixels that belong to the dry sand class but have been classified as wet sand. The adjacent image shows the same plot but without the ground truth break-lines and the pixels highlighted for better visualization. It can be observed from the figure that the misclassification is higher along the edges that separate the dry and wet zones of the beach. Figure 6.6 shows the classification results using the K-means classifier.

PAGE 47

47 Dry Wet Water Total Producers Accuracy (%) Dry 42662 775 0 43437 98.22 Wet 1742 36848 1377 39967 92.19 Water 6 1352 22080 23438 94.20 Total 44410 38975 23457 106842 Consumers Accuracy (%) 96.06 94.54 94.12 Overall accuracy (%) 95.08 Kappa coefficient (k) 0.92 Table 6-1. Confusion matrix for Bayes classifier Dry Wet Water Total Producers Accuracy (%) Dry 42860 577 0 43437 98.67 Wet 1426 37257 1284 39967 93.21 Water 3 1080 22355 23438 95.37 Total 44289 38914 23639 106842 Consumers Accuracy (%) 96.77 95.74 94.56 Overall accuracy (%) 95.90 Kappa coefficient (k) 0.93 Table 6-2. Confusion matrix for Bayes classifier after applying a majority filter (4 neighbors)

PAGE 48

48 Dry Wet Water Total Producers Accuracy (%) Dry 42871 566 0 43437 98.69 Wet 1413 37296 1258 39967 93.31 Water 2 1035 22401 23438 95.57 Total 44286 38897 23659 106842 Consumers Accuracy (%) 96.80 95.88 94.68 Overall accuracy (%) 95.9997 Kappa coefficient (k) 0.93 Table 6-3.Confusion matrix for Bayes classifier after applying a majority filter (8 neighbors) Dry Wet Water Total Producers Accuracy (%) Dry 41555 1882 0 43437 95.6672883 Wet 496 38236 1235 39967 95.6689269 Water 4 1541 21893 23438 93.4081406 Total 42055 41659 23128 106842 Consumers Accuracy (%) 98.81108 91.78329 94.66015 Overall accuracy (%) 95.17231 Kappa coefficient (k) 0.92 Table 6-4.Confusion matrix for K-means classifier

PAGE 49

49 Dry Wet Water Total Producers Accuracy (%) Dry 41890 1547 0 43437 96.4385202 Wet 334 38474 1159 39967 96.2644181 Water 1 1248 22189 23438 94.671047 Total 42225 41269 23348 106842 Consumers Accuracy (%) 99.20663 93.22736 95.03598 Overall accuracy (%) 95.98566 Kappa coefficient (k) 0.93 Table 6-5.Confusion matrix for K-means classifier after applying a majority filter (4 neighbors)Dry Wet Water Total Producers Accuracy (%) Dry 41899 1538 0 43437 96.4592398 Wet 318 38515 1134 39967 96.3670028 Water 3 1190 22245 23438 94.9099753 Total 42220 41243 23379 106842 Consumers Accuracy (%) 99.2397 93.38554 95.14949 Overall accuracy (%) 96.08487 Kappa coefficient (k) 0.93 Table 6-6. Confusion matrix for K-means classifier after applying a majority filter (8 neighbors)

PAGE 50

50 Figure 6-1.Histograms of three classes for training data Figure 6-2. PDFs and decision boundaries

PAGE 51

51 Figure 6-3.Classification using Bayes classifier (Red pixels represent dry sand, green pixels wet sand and blue pixels water) This line delineates the dry and wet portions of the beach. It is digitized manually using the aerial imagery. This line delineates the wet and water portions of the beach. It is digitized manually using the aerial imagery.

PAGE 52

52 Figure 6-4.Dry sand misclassified as wet sand. Left half of the image shows the misclassified pixels with the ground truth break-lines. The other part has the same pixels enlarged for better visualization and the break lines removed

PAGE 53

53 Figure 6-5.Wet sand misclassified as dry sand.

PAGE 54

54 Figure 6-6.Classification using k-means classifier (Red pixels represent dry sand, blue pixels wet sand and green pixels water)

PAGE 55

55 CHAPTER 7 CONCLUSION An automated approach to classify beach zones is presented and its credibility is assessed. The classification is vital for coastal engineers and researchers who are concerned with local or zonal statistics of coastal regions because the traditional methods used for this purpose are largely challenging. The strenuous tasks of manual digitization can effectively be eliminated and also the final quality of the datasignificantly improved. The three information classes are difficult to separate using traditional methods since the lack of aerial imagery hampers the quality of the classified data by allowing the intervention of human error. LIDAR intensity provides a good alternative as a classification feature as opposed to elevation since the variation of surface elevation near a beach zone is minimal. The dominant effects of surface material on the electromagnetic radiation, once studied and analyzed for a proposed area, can successfully be leveraged to distinguish different kinds of information classes on the ground. The usage of a multivariate index and a confusion matrix is greatly advantageous since the classifiers ability in distinguishing between various combinations of classes can be statistically studied. Hence the method can still be employed if separibility is high between few classes and low among others. An overall accuracy of about ninety two percent and a kappa coefficient that is greater than 0.90 in both the classifiers indicate a very good agreement against that which might be expected by chance.The classification results exhibit relatively high accuracy values (both producers and consumers accuracies) for the dry class when compared with the other two classes in the supervised as well as the unsupervised case. It is a clear indication that the classifiers are able to distinguish the dry class from the other two classes very well but the performance of the classifiers is lower while trying to distinguish between the wet and other

PAGE 56

56 classes.The final accuracy measures are dependant on a combination of factors such as the accuracy of the ground truth (which was about 3 meters), and the cell size of the intensity raster grids (1 meter). Also, the test area used in this analysis has a width approximately equivalent to the width of one ground truth image (~ 200 meters). Hence the 3 meter value will limit the expected accuracy of the classification. However, it has to be noted that the test area chosen in this analysis is a short segment of a beach and the complexity of the spatial distribution of errors may increase while working on much larger data sets. When compared to traditional modes of surveying at coastal regions and the mathematical components that are used for similar challenges, the method proposed here would offer a more reliable, faster and efficient choice. In the light of increased aberrations to coastal climate worldwide due to global warming, storms and hurricanes, a faster way to continually monitor coastal health becomes more relevant. There are many interesting aspects that can further be examined as an extension to this work. In addition to intensity, the possibility of using the magnitude of spatial displacement of different information classes with reference to a local origin as a supporting classifying feature can also be exploited. With the advent of more powerful laser systems with frequencies exceeding hundred kilo hertz, data models with much finer resolutions may be tested against the method presented in this research to study the likelihood of achieving greater classification accuracies. A set of guidelines that would aid in choosing optimal parameters for classification could significantly improve results and provide a means to achieve faster turnaround times.

PAGE 57

57 LIST OF REFERENCES Carter, W.E., Shrestha, R., Tuell, G., Bloomquist, D., Sartori, M., 2001. Airborne Laser Swath Mapping shines new light on Earth's topography. EOS, Transactions, American Geophysical Union, Vol. 82, No. 46, November, 2001 Coastal Mapping Handbook, US Department of Commerce, National Oceanic and Atmospheric Administration, 1978 Duda, Richard O., Hart, Peter E., Stork, David G., 2000. Pattern Classification, (2nd Edition) Wiley Interscience, New York. Hu, Xiangyun, Tao, Vincent C., Hu, Yong. Automatic Road Extraction From Dense Urban Area By Integrated Processing Of High Resolution Imagery And Lidar Data. IEEE Geoscience and Remote Sensing Letters, Vol. 2, No. 3, July 2005 Jeong-Heon Song a, Soo-Hee Han b, Kiyun Yu c,Yong-Il Kim d. Assessing The Possibility Of Land-Cover Classification Using Lidar Intensity Data. IEEE Geoscience and Remote Sensing Letters, Vol. 2, No. 3, July 2005 Kaasalainen Sanna, Ahokas, Eero, Hyypp, Juha, Suomalainen Juha. Study of Surface Brightness From Backscattered Laser Intensity: Calibration of Laser Data. IEEE Geoscience and Remote Sensing Letters, Vol. 2, No. 3, July 2005 255 Lai Xudonga, Zheng Xuedongb, Wan Youchuana. A Kind Of Filtering Algorithms For Lidar Intensity Image Based On Flatness Terrain. IEEE Geoscience and Remote Sensing Letters, Vol. 2, No. 3, July 2005 Lutz, E., Geist, Th.,Stotter, J. Investigations Of Airborne Laser Scanning Signal Intensity On Glacial Surfaces Utilizing Comprehensive Laser Geometry Modeling And Orthophoto Surface Modeling. IEEE Geoscience and Remote Sensing Letters, Vol. 2, No. 3, July 2005 Luzum, Brian J., Slatton, Clint K., Shrestha, Ramesh L., Analysis of Spatial and Temporal Stability of Airborne Laser Swath Mapping Data in Feature Space. IEEE Transactions on Geoscience and Remote Sensing, Vol. 43, No. 6, June 2005 Luzum, Brian J., Slatton, Clint K., Shrestha, Ramesh L. Identification and Analysis of Airborne Laser Swath Mapping Data in a Novel Feature Space. IEEE Geoscience and Remote Sensing Letters, Vol. 1, No. 4, October 2004 Luzum Brian J., Starek, Michael, J., Slatton, Kenneth C. Normalizing ALSM Intensities. GEM Center Report No. Rep_2004-07-001 Nobrega, R. A. A., Hara, C. G. O., Segmentation And Object Extraction From Anisotropic Diffusion Filtered Lidar Intensity Data. IEEE Geoscience and Remote Sensing Letters, Vol. 2, No. 3, July 2005 Seber, G.A.F. Multivariate Observations, (2nd Edition) Wiley Interscience, August 2004

PAGE 58

58 Starek M. J., Vemula R. K., Slatton K.C., Shrestha R. L., Carter, W. E. Automatic Feature Extraction from Airborne Lidar Measurements to Identify Cross-Shore Morphologies Indicative of Beach Erosion. IEEE International Geoscience and Remote Sensing Symposium, 2007 Starek M. J., Vemula R. K., Slatton K.C., Shrestha R. L., Carter, W. E. Shoreline Based Feature Extraction And Optimal Feature Selection For Segmenting Airborne Lidar Images. IEEE International Conference on Image Processing, 2007 Tso, Brand, Mather, Paul M., 2001. Classification methods for remotely sensed data, Taylor & Francis, 2001 Wright, Dawn, Bartlett, Darius. Marine and Coastal Geographic Information Systems. Taylor & Francis, 2000

PAGE 59

59 BIOGRAPHICAL SKETCH The author received a Bachelor of Technology in civil engineering from Jawaharlal Nehru Technological University, India in 2001.