1 Principal Component Analysis of Beach Profiles at Matanzas Inlet, Florida By Susan West Undergraduate Thesis Advisor: Dr. Peter Adams University of Florida 2011
2 Acknowledgements I would like to thank Dr. Peter Adams for giving me the opportunity to work on a mathematics based project. His patience and willingness to argue his point until I understood were invaluable. His idea to let me explore this technique gave me a much greater understanding of not only linear a lgebra, I would also like to thank Dr. Scott McKinley in the math department and Dr. Trevor Park in the statistics department for their help in my understanding of the mathematical concepts and interpretations. Finally, I must thank Katherine Malone and everyone else who went out every month to collect the data used in this paper.
3 Abstract Beaches adjacent to tidal inlets a djust their morphology in response to changes in wave forcing and changes in inlet conditions T o quantify the temporal changes in beach morphology, a time series of cross shore profiles may be examined through a principle component s analysis to reveal major modes and patterns in the temporal variation. Herein princi ple component s analysis is, first, presented via two examples to illustrate the mechanics of the method and then applied to a 14 month record of beach profiles near Matanzas Inlet, Florid a. A comparison of the principle component s analysis results to those of previous studies suggests that the empirical significance given to the first principle component is consistent with previous studies, but that the second principle component may have identified a different mode of variation in the context of a tidal inlet Introduction Beach morphologies adjacent to coastal inlets are influenced by assailing wave climate and sedimentary and hydraulic processes associated with the inlet Although m any inlets are protected by jetties, there are natural inlet s tha t provide opportunities for studying and understanding the influence of inlet processes on beach dynamics Inlets are affected by tides in a way that does not occur on straight stretches of beach. With the approach of high tide, water is forced through t he mouth of the inlet in a current that slows once it reaches the wider bay beyond. With the approach of low tide, all of the added water within the bay exits by the same inlet mouth. This pronounced two way flow produces subaqueous, geomorphic features that dif f er from those found on straight beaches. Principle Component Analysis (PCA) or Empirical Orthogonal Function (EOF) analysis, has been used to examine the temporal behavior of beach profiles as well as shoreline positions. Application of PCA to profiles began with Winant et al. (1975) and Aubrey (1979) who both used data sets from Torrey Pines Beach in California. Many researche rs have used variations of the original method. In general, beach topography data are used to generate pairs of eigenvectors and corresponding temporal coefficients. The significance of the eigenvectors is empirically derived from the data. Since the ei genvectors are mutually orthogonal, the modes of variation found through the analysis are free of covariance. In this way, the modes are separated and how much a single mode contributes to the total variance can be quantified by its corresponding eigenval ue. The temporal coefficients can reveal long term fluctuations in the shape of the beach profile. Previous Work Winant et al. (1975) used EOF analysis on data from Torrey Pines. They applied the technique to cross shore profiles throughout a 2 year data set recorded from 1972 to 1974. Using uncentered data, they identified a mean beach profile function corresponding to the first eigenvector, a bar berm function corresponding to the second and a terrace function corresponding to the third. They effectiv ely captured the seasonal variation in the temporal coefficients of the bar berm and terrace functions and a slight negative slope to the temporal coefficients suggesting an erosional trend for the mean profile. Aubrey (1979) used monthly profile data fr om 1972 to 1977 taken at Torrey Pines. Aubrey established that the first three eigenfunctions described the dominant modes of variance and that the
4 remaining eigenfunctions could not be separated from noise. Using uncentered data, the first pair of eigen functions showed the mean beach profile with no net erosion or accretion. The second pair described the bar berm pattern with a clear seasonal trend. The third pair describe the low tide terrace pattern with complicated time dependence. Pruszak (1993) u sed cross shore data from the Baltic Sea coast taken in varying intervals between 1964 and 1991, and from the coast of the Black Sea from 1972 to 1978. Pruszak assumed an urve was used to determine the matrix in his eigenfunction analysis. His analysis of coastlines on the Baltic Sea and the Black Sea had similar results in that the first three eigenfunctions described the majority of the variance. The first eigenfunction corresponded to the mean beach profile complete with bars and berms. The second and third eigenfunctions efficiently described various places and intensity of erosional and accretional features. In his study, the temporal coefficients were not emphasize d. Gao et al. (1998) used data recorded infrequently along seven cross shore transects at two beaches in southern England from 1974 to 1989. They used the calculated slopes from one point to the next on the transect to form the matrix used in their eigenf unction analysis. They used the first eigenvector and corresponding temporal coefficient as evidence for equilibrium. The first temporal coefficients showed evidence of a steady state with the exception of two peaks. Speculation provided reasonable caus es for the peaks, but there were no wave records to solidify the causes. No matter what the cause, the beach recovered quickly back to the steady state. The first eigenvector was then used to propose an equilibrium profile. Miller and Dean (2007) applied EOF analysis to shoreline position data for three sites. All three sites were known to have a seasonal wave climate. They also found that the first three eigenfunctions explained the majority of the variance. Using uncentered data, they also found a st rong correlation between the first eigenvector and the mean shoreline position. The correlations they found with the second and third eigenfunctions were examined site by site. For instance, in their study at Duck, North Carolina, the second eigenfunctio n seems to show a pattern of erosion and accretion with a nodal point located at a pier. The third mode had a local extrema at the pier with multiple temporal fluctuations. The main frequency was low on the order of 0.1 cycles per month, while the others were fluctuations measured in years. Due to the large time frames of their data sets, they were also able to detect temporal oscillations, both annual and on larger scales. Scientific Problem Statement EOF analysis has proven to be a powerful analytic t echnique and versatile in how it can be applied. Researchers can define the original matrix based on what they would like to study and easily find the dominant modes of variation. The intent of this study is to apply PCA to the beach topography adjacent to the autonomous inlet at Matanzas to see what dominant modes of variance are present. Specifically, this thesis is motivated by the following questions: 1. Does the beach adjacent to the Matanzas Inlet show similar results to those of earlier PCA studie s? 2. Does the PCA analysis of the Matanzas Inlet beach data reveal a seasonal variability? 3. What implications do these data suggest regarding differences in the modes of variance at the mouth of the inlet versus the long stretch of beach to the north of the mouth?
5 4. Do the temporal coefficients resulting from the PCA show suggest a dependence on a temporally varying process such as wave climate? Site Description The Matanzas Inlet lies on the North Florida Atlantic coast at the southern end of Anastasia Island. The inlet has been left in its natural state in that no man made structures have been built on the beach to the north. Due to the building of Fort Matanz as, originally built at the mouth of the inlet, many maps of the area have been drawn, showing the changing shape of the coastline and inlet. According to 1200 m since the fort was built in 1742 (Malone, 2011 ). The study area is the beach to the north of Matanzas Inlet. The northern portion of the beach is relatively straight. The southern portion curves to the west following the mouth of the inlet. Methodology The general data set consists of elevations across 30 cross shore transects along the northern beach adjacent to Matanzas Inlet. Data was collected monthly over a 14 month period. PCA was performed on each of the 30 transects. Each transect yielded a matrix of unique size to maximize t he usable data set. For example, transect 15 had the most usable data along 51 locations over the 14 observation times. So the data matrix used was 14x51. This paper includes the PCA details for t hree representative transects. Transect 15 shows the d ominant patterns found in the Northern end of the study area. Transect 27 shows the dominant patterns found in the Southern end of the study area. Transect 30 is shown as one of only three transects whose principle eigenvector shows a zero crossing. Gene ral PCA method The EOF method uses a combination of statistics and linear algebra to isolate the dominant patterns in a data set. The process itself is easier to understand with a small example. Imagine we have n observations of p variables. This woul d yield a data matrix whose dimensions are n rows and p columns. For the sake of providing a tangible example, let us say that n =3 and that p =4. Yielding 4 cross shore locations where elevations ( Z ) are measured at 3 different times, The mean elevation at each cross shore position over time is given by
6 where n is the number of temporal observations. distance is squared so positive and negative values do not cancel each other out. Since the value of the mean has been fixed, that is, not allowed to vary, one degree of freedom has been lost. The sum is then divided by ( n 1), the number of variables tha t are allowed to vary. The standard deviation is given by The n x p data matrix is used to find the variance covariance matrix. Statistically, the variance of a single variable over time is given by The covariance over time between two variables, when ij ij is given by the expanded form The variance and covariance over time n can be expressed in matrix form as In matrix form, the variance covariance matrix can be computed by first subtracting the mean, Then multiply on the right by the transpose and divide by ( n 1)
7 Since the covariance of Z at time 1 and time 2, is the same as the covariance of Z at time 2 and time 1, the variance covariance matrix is a square, symmetric matrix. That is, there are as many columns as there are rows and the matrix is equal to its transpose. A property of an n x n symmetric matrix is that it yields n real valued eigenvalues and n correspon ding eigenvectors. The hand worked method of finding eigenvalues and eigenvectors is easiest to understand with a 2x2 matrix example. One can imagine how time consuming this would be as determinants yield larger degree polynomials, so once the process is understood it is preferable to let computer software such as MATLAB, which has a built in command, handle eigenvalues and eigenvectors. Eigenvectors found with MATLAB are reported as unit vectors. The 2 To find the eigenvector, u, corresponding to the eigenvalue, 2. Set the matri x equal to a column of 0, 0 and use Gaussian row reduction to solve for the vector. Divide by the length for the unit eigenvector. The impact of this result is illustrated in principle component analysis. In the Cartesian coordinate system, data can plot as some scalar multiplied by the unit vectors along the X, Y, and Z orthogonal axes. The d ata in 3 dimensional space can be viewed as describing the shape of an ellipsoid. The ellipsoid can also be described by scalars and orthogonal unit vectors in the principle directions whose axes are not necessarily parallel to the X, Y, and Z axes. This new set of axes is given by the eigenvectors, the direction of the axes, and the eigenvalues, the magnitude in the direction of the corresponding eigenvector.
8 We begin with the variance/covariance matrix. Another useful feature of the variance/covarianc e matrix is that the trace, given by adding the entries on the diagonal, will equal the sum of the eigenvalues. will yield and, The three eigenvectors that correspond to eigenvalues are given by We now have two coordinate axes for the same data points. Just as a point can be seen as a linear combination of the x, y, and z unit vectors multiplied by scalars, it can also be seen as linear combination of the eigenvectors multiplied by scalars, which are commonly called the principle component scores. To find the sco res, multiply the centered data on the left by the transposed eigenvectors. The eigenvectors display the dominant patterns in th e data through time. The coefficients, or scores, display the dominant patterns in the data across position. Multiplying the eigenvectors by the scores returns the centered data.
9 Solving for the coefficients, c is possible because of the orthogonality of the eigenvectors. For column of centered elevations Z Z however, ha ve components of all three eigenvectors. But the i nner product of a vector with itself is 1 and the inner product of a vector with an orthogonal vector is 0. In this manner, the specific coefficient for the first eigenvector that applies to the components of the elevations, Z can be isolated. 3 D Ex ample The picture of what is happening can be shown in a three dimensional example. Begin by centering the data. Calculate the variance/covariance matrix.
10 By MATLAB, Eigenvalues = Eigenvectors = Total variance= 14.5833 Solve for the coefficient matrix, We can reconstruct the centered data,
11 Adding the mean back on gives us the original data. The third eigenvalue accounts for very little of the variance in the data and could be eliminated as noise by replacing the third ei genvector with a column of 0, 0, 0. By rotating the box, we can see the red stars, representing the reduced data, now lie along a plane not parallel to either the X, Y, or Z axis.
12 Eliminating the variance due to the second eigenvector as well leaves the points along a line shown by the blue stars. We can reduce the data to each vector individually to see the rotated axes.
13 The 3x1 eigenvectors will reflect the dominant patterns in the original data matrix rows. Because the mean of each row has been subtracted, the dominant pattern in this case is the variance. Keep in mind that the eigenvector is a unit vector and will not replicate the variance. The first 4x1 component scor e will reflect the dominant patterns in the data columns. Since the mean of the columns has not been subtracted, the mean in this case is the dominant pattern. Again, the score is a representation of the magnitude on a unit vector and will only reflect t he shape of the pattern.
14 A point should be made about the relationship between the eigenvectors and the scores. To get the variance/covariance matrix, we multiplied on the right by the transpose to produce a 3x3 matrix. If we were to multiply on the l eft by the transpose instead of the right, we would have a 4x4 symmetric matrix that would produce similar results. The first eigenvector is now 4x1 and reflects the dominant pattern in the data columns, which again have not been centered.
15 The first component score is now 3x1 and reflects the dominant pattern in the data rows. Because of this relationship, the computation of the scores may not be necessary. The eigenvectors and scores are only representations of patterns so the eigenvectors produ ced by flipping the multiplication of the centered data can be used interchangeably with the original scores. Even though one covariance matrix is 3x3 and the other is 4x4, the three eigenvalues from the first will be the same as the first three eigenvalue s from the second. The use of the lower eigenvalues is better seen on a larger data set. Method for Matanzas For the Matanzas study, beach elevation data was gathered once a month for a period of 14 months from January 2009 to February 2010 (Malone, 2011) T he data was used to generate thirty cross shore transects that were used for this study. The 330 m long transects lie orthogonal to shore, anchored to the proto dune base line at the 100 m point. At this 100 m point, the transects are spaced 50 m apa rt. The elevation data points along individual transects are spaced 1 m apart. The elevations themselves were interpolated from the original data.
16 Fig 1 : Arial view of the beach adjacent to Matanzas Inlet with overlain transects (Malone, 2011).
17 Fig 2: Interpolated elevations of the study area given in meters by color. For each cross shore transect a matrix of elevations as a function of cross shore position in the rows and as a function of time in the columns was used. Nesting habitats and varying tidal ranges created inconsistencies in the range of cross shore coverage month to month. Due to these inconsistencies in the gathered data, a MATLAB program was used to isolate a block of data with no missing data points over as many months as possible. The program isolated the string of data for each month that contained no holes and reported the minimum cross shore position and the maximum cross shore position. Any months that significantly limited the useable r ange of data points were excluded. For example, for transect 15, the usable data that has no holes for all 14 months is from 103 meters to 153 meters.
18 Fig. 3 : The overlay showing transect 15 imposed on the interpolated elevation data for January 2010. The colors given by the color bar represent elevation relative to sea level. The maximum and minimum values for each month for transect 15 are given by: 90 153 92 190 92 179 103 194 101 179 101 164 103 192 101 201 103 188 101 159 101 197 103 157 99 232 101 177
19 Full range = 90 232 Restricted range = 103 153 January of 2009 li mits the maximum at 153 meters, yet eliminating the January data will only increase the maximum by 4 meters to 157 meters, since the data from December 2009 stops at 157 meters. There are four months that restrict the lower bound to 103 m, so there is no practical way to extend the lower end of the range. January 2009 was typically the month that limited the use of the data, but unless the gain in the data set was at least 10 meters, the data from January was used in this analysis. Fig 4 : Full cross shore profiles for transect 15 with a box around 103 m to 153 m. Note the berm feature present during the summer months.
20 Fig 5 : Cross shore profile for transect 15 restricted to the 103 m to 153 m area. Fig 6 : Cross sectional profiles for transect 15 overlain by the statistical mean and variance of the data from 103 m to 153 m. The data matrix for transect 15 is a 51x14 matrix of elevations.
21 The mean of the elevations across time was subtracted for each posi tion, that is, the mean of each column was subtracted from the matrix. The centered matrix was then multiplied on the left by its transpose and divided by ( n 1)=13, producing a 1 4 x1 4 variance/covariance matrix whose eigenvectors would then relate to the mean elevation through time and whose scores would relate to the spatial, cross shore variance The alternative way to find the spatial, cross shore variance would be to multiply the centered matrix on the right by its transpose and divide by ( n 1). The eigenvectors of the 51x51 matrix would show the same pattern as the scores of the 14x14 matrix. It is important to point out that the eigenvectors and corresponding scores are a unitless mathematical concept. In this type of analysis, the patterns that t hey represent are found within the data. To be consistent with previous work for the sake of comparison, t he eigenvectors, as they have been calculated here, are a function of the cross shore position, represented by e(x). The component scores are a func tion of time, represented by s(t). The total variance for transect 15 is 7.9130. The first eigenvalue is 7.3298 which accounts for 92.63% of the variance. The second eigenvalue is 0.3119 which accounts for 3.94% of the variance. The third eigenvalue is 0.1971 which accounts for 2.49% of the variance. The first three eigenvalues account for 99.06% of the variance. Eigenvectors, e(x) Fig 7 : Graphical representation of the first three eigenvectors corresponding to the dominant patterns in the cross shore elevations. The first eigenvector reflects the shape of the cross shore variance over the 51 m covered. The entire stretch is positively correlated in that the full length along the transect will move up or down together, not some areas increasing, while other areas decrease. The profile has more variance in the center than at either end of the cross shore length. A possible reason for the higher amount of variance is the presence of a berm which can be seen in the red summer months of the profile data ( Fig 5 )
22 Fig 8 : Statistical variance cross shore plotted with the first spatial eigenvector. The 51 unit vectors follow the shape of the variance. The second and third eigenvectors represent a significantly lower proportion of the variance. Because the second and third eigenvalues represent a similar amount of the remaining variance, it is hard to distinguish whether they represent a significant pattern or are just part of the background noise. The 48 lower eigenvectors that account for the remaining .94% of the variance are considered noise. Principle component scores, s(t)
23 Fig 9 : Graphical representation of the first three principle component scores corresponding to temporal patterns in elevation. The first principle component score reflects the shape of the mean elevation through the 14 month time period. It represents the relative magnitude and direction, gain or loss, over the 51 m of profile. Together, the eigenvector and the pc score show us the picture of what is happening at transect 15. The score shows the net increase or decrease through time and the eigenvector shows how an increase or decrease will affect the cross shore elevations. That is, areas of greater variance will be affected more than the areas of lesser varianc e.
24 Fig 10 : Statistical mean of elevation through time. The mean multiplied by 5 is shown in magenta to provide a comparison of the shape. Since the score shows the magnitude of the corresponding unit eigenvector, it is easier to visually compare to a multiple of the statistical mean. Results and Analysis The first eigenvectors for all but three transects were positively correlated, that is, they did not cross zero The scores for those transects reflect the increase and decrease through time. A simple regression line applied to the scores reflects either a net increase or decrease in elevation over the full time period given by a positive or negative slope.
25 Fig 11: Trend line applied to transect 15 first temporal score. Using the temporal scores associated with the first eigenvectors, the beach is divided into an area of decreasing elevation through time to the north of transect 18, and an area of increasing elevation through time to the south of transect 18. Transect 27 is an example of t he increase. Similar results were found in the study of beach slopes and widths using the same data (Malone, 2011)
26 Fig 12: First three eigenvectors for transect 27. Fig 13: First three temporal scores for transect 27.
27 Fig 14: Trend line applied to transect 27 first temporal score. Since the mean has been subtracted through time, the first eigenvector here is analogous to previously studied bar berm functions (Winant et al, 1975) Being a quieter stretch of beach, the zero crossing is not seen u ntil farther into the mouth of the inlet, and even then, is not consistently seen. However, south of transect 18, this first function accounts for less of the variance than it did in the northern area. In the north, the second mode of variance accounts for 2.58% to 15.08% of the total variance. In the South, the second mode accounts for 4.24% to 37.45% of the overall variance. Looking back to Fig 1 it is clear that the transects north of transect 18 are approximately parallel to each other, reflectin g the approximately straight stretch of beach. While those south of 18 are angled relative to each other as the beach curves around the spit. Between transects 18 and 30, the beach face direction changes by an angle of 87. This means the angle of incom ing waves and therefore wave energy also varies. The tides also play a large part in the geomorphology close to the mouth of the inlet.
28 Fig 15 : The first three eigenvectors for transect 30. Note that the first eigenvector crosses 0 at approximately the 146 m point. Fig 16 : The first three temporal scores for transect 30.
29 Fig 17 : The first temporal score for transect 30 showing a trend line with a negative slope. The zero crossing of the first eigenvector indicates that part of the beach is gaining in elevation, while part is losing elevation. The empirical reason for this is the presence of a berm This is consistent with the bar berm function in previous works (Winant et al., 1975 ) Since the calculated sign of the eigenvector is arbitra ry, it can be flipped to show a more reasonable pattern, with the berm building up inland. The sign then also flips for the score which yields an overall increase that is consistent with the other transects in this area. In this transect, the power of pri nciple component analysis is clear. A simple look at the statistical variance would miss this feature since variance is non negative. The second eigenvector accounts for a relatively high amount of the variance. In previous work, this eigenvector has be en called the terrace function (Winant et al. 1975 ). Empirically, this function may correlate to a feature seen adjacent to tidal inlets. The third eigenvector accounts for 0.77% to 4.74% of the variance in the northern area and up to 11.51% as in transe ct 25 in the southern area. Conclusions Principle component analysis of the beach at Matanzas Inlet has yielded results similar to previous work done on straight beaches. Because the mean elevation through time has been subtracted
30 off, the first eigenvector corresponds to the bar berm function found by Winant et al, Miller and Dean and others. Their label of the function stems from two zero crossings, one indicating the on shore berm and the other indicating the off shore bar. Mat anzas lacks a significant buildup of sand in the backshore along the straight northern stretch of beach. The first eigenvector indicates the presence of a small berm in areas where variance is higher along the transect. This first eigenvector accounts fo r the majority of the variance. However, in the southern section of the study area, the first eigenvector accounts for less of the variance, especially in transects that show a zero cro ssing. This indicates that the second mode of variation plays a more significant role in the southern section. This mode, called the terrace function for straight beaches, could possibly correspond to an ebb tide delta. In future research, a few considerations may yield more results. With consideration to the first tempo ral score, a longer data set could reveal seasonal changes could be captured in this kind of analysis. The 2009 hurricanes had little impact on this pa rt of Florida, so 2009 by itself tells us little, but as part of a larger data set containing years of more active storm seasons, it would tell us if there is a seasonal pattern or if beach recovery is so quick, no pattern emerges. With consideration to th e second spatial eigenvector, this mode of variance may be tidal in nature. One way to find out if this is true would be to analyze a more dense data set. Tidal fluctuations the measurements are taken much more often, weekly, if not more. Finding a correlation between monthly fluctuations and perigee/apogee cycles would be strong evidence. The third consideration would be consistent coverage along shore. This would allow fo r the use of combined eigenfunctions to reconstruct the data in a three dimensional model. This would show how the individual modes affect the beach along shore.
31 References Aubrey, D.G., 1979. Seasonal patterns of onshore/offshore sediment movement. Jou rnal of Geophysical Research 84 (C10), 6347 6354. Gao, S., Collins, M., Cross, J., 1998. Equilibrium Coastal Profiles:II Evidence from EOF Analysis. Chinese Journal of Oceanology and limnology 16(3), 193 205. Malone, K., 2011. Seasonal and spatial variabil ity of beach morphodynamics at an autonomous tidal inlet: Matanzas Inlet, Florida Atlantic Coast. Graduate thesis, University of Florida. Miller, J.K., Dean,R.G., 2007. Shoreline variability via empirical orthogonal function analysis: Part I temporal and spatial characteristics. Coastal Engineering 54(2007), 111 131. functions. Coastal Engineering 19(1993), 254 261. Winant, C.D., Inman, D.L., Nordstrom, C. E., 1975. Description of seasonal beach changes using empirical eigenfunctions. Journal of Geophysical Research 80(15), 1979 1986.