1 THE MULTIDISCIPL I NARY PERSPECTIVE OF CELL SHAPE By STEPHEN HUGO ARCE A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 201 4
2 Â© 201 4 Stephen Hugo Arce
3 To Mom, Dad and Megan
4 ACKNOWLEDGMENTS I would like to thank my Mom and Dad and s ister, Natalie, for all of the support and guidance they have provided. My first memories are of reading books and always sharing a good meal , or of fixing things around the house or playing soccer games over the weekends, and those experiences and lessons in kindness and diligence have carried me through the rigors of graduate school and life in general. Megan, I love you, Kokosnoot . You have balanced my life and continue to provide me with the common sense I lack at times, and I look forward to our life t ogether, whatever m a y come next. I would like to thank my Advisor , Prof. Yiider Tseng , for being my greatest ally over the past few years, who has helped me to grow as a person as well as a scholar. I would also like to thank Prof. Spyros Svoronos, who let me into the department in the first place and whose enthusiasm for teaching has been a constant inspiration. My thanks to Cynthia, Shirley and all of the ladies who keep things going in the background. I owe an enormous thank you Mr. Wade and to the rest of the OGMP, as well as Charles Jackson , Dr. Moorehouse and the FEF for giving me the means to keep working hard. I also want to recognize Dr. Donnelly for her work with SEAGEP. I wish to thank Casey LaMarche, a real life superhero . Sinan Dag, thank you fo r setting the stage, and Pei Hsun Wu, thank you for introducing me to microscopy and Matlab. Thank you, Jun, our conversations have been key, and thank you Shen Hsiu, your example is hard to follow. ugh NSC402, your hard work was appreciated. Charlie, Marshall, Bruno, Heather, Kyle , Thomas, Ahren, Virat , Joe and the entire crew of Risky Business , thanks for the downtime , and to all of my friends and family that kept me afloat. You have all been most have done it without you.
5 TABLE OF CONTENTS page ACKNOWLEDGMENTS ................................ ................................ ................................ .. 4 LIST OF FIGURES ................................ ................................ ................................ .......... 7 ABSTRACT ................................ ................................ ................................ ................... 14 CHAPTER 1 INTRODUCTION ................................ ................................ ................................ .... 16 Motivation ................................ ................................ ................................ ............... 17 GTPases ................................ ................................ ................................ ................. 21 MicroRNA ................................ ................................ ................................ ............... 23 Epithelial to Mesenchymal Transition ................................ ................................ ..... 25 Multidisciplinary Perspective ................................ ................................ ................... 27 Background on Digital Images ................................ ................................ ................ 29 Digital Image Format ................................ ................................ ........................ 29 Background Subtraction ................................ ................................ ................... 30 Noise ................................ ................................ ................................ ................ 32 Point Spread Function ................................ ................................ ...................... 33 Digital Image Resolution ................................ ................................ ................... 34 Spatial Frequency ................................ ................................ ............................ 35 Background on Image Acquisition ................................ ................................ .......... 36 Objectives and Magnification ................................ ................................ ............ 36 Fluorophores ................................ ................................ ................................ .... 37 Temporal Resolution ................................ ................................ ........................ 39 2 NEW IMAGE PROCESSING TECHNIQUES ................................ .......................... 48 Processing Noise ................................ ................................ ................................ .... 48 Image Cropping ................................ ................................ ................................ 48 Robust Statistics ................................ ................................ ............................... 49 Expectation Maximization ................................ ................................ ................. 50 Gaussian Blurring ................................ ................................ ............................. 52 Wavelet De noising ................................ ................................ .......................... 53 Anisotropic Diffusion ................................ ................................ ......................... 56 Dinosort ................................ ................................ ................................ ............ 58 Low Spatial Frequencies ................................ ................................ ......................... 61 Jans son Van Cittert Deconvolution ................................ ................................ .. 61 Built in MATLAB Deconvolution Methods ................................ ......................... 63 Automated Spatial Filtering ................................ ................................ .............. 64 Detecting Objects ................................ ................................ ................................ ... 68 Segmentati on ................................ ................................ ................................ ... 69 Edge Detection ................................ ................................ ................................ . 70 Graph Cuts ................................ ................................ ................................ ....... 71
6 Level Sets ................................ ................................ ................................ ......... 71 Level Sets and ASF ................................ ................................ .......................... 73 Detecting the Cytoskeleton ................................ ................................ ............... 74 Summary ................................ ................................ ................................ .......... 74 3 QUANTIFICATION OF CELL SHAPE BY DASF IMAGE PROCESSING ............. 109 Cell Displacement ................................ ................................ ................................ . 109 Centroids ................................ ................................ ................................ ........ 109 The Dist ance Transform ................................ ................................ ................. 111 Principal Component Analysis ................................ ................................ ........ 112 Protrusion Analysis ................................ ................................ ......................... 113 Cell Sha pe ................................ ................................ ................................ ............ 114 Circularity ................................ ................................ ................................ ....... 115 Eccentricity ................................ ................................ ................................ ..... 116 Symmetry ................................ ................................ ................................ ....... 117 Measuring Changes in Cell Shape ................................ ................................ ....... 119 Manifolds and Geodesics ................................ ................................ ...................... 123 Projections onto Shape Space ................................ ................................ ....... 124 Geodesic Distance and Boundary Curvature ................................ ................. 125 Shape Clustering ................................ ................................ ............................ 126 Mean Cell Shape ................................ ................................ ............................ 128 Summary ................................ ................................ ................................ ........ 129 4 CONCLUSIONS AND RECOMMENDATIONS ................................ ..................... 153 APPENDIX: METHODS ................................ ................................ .............................. 157 REFERENCES ................................ ................................ ................................ ............ 159 BIOGRAPHICAL SKETCH ................................ ................................ .......................... 169
7 LIST OF FIGURES Figure page 1 1 A screenshot demonstrates how to load three types of images into MATLAB and how the imread function automatically chooses the variable class. ............. 41 1 2 An improvised blank image may be constructed by piecing together the two opposite sides of different frames of a time series to achieve background subtraction. ................................ ................................ ................................ ......... 42 1 3 An intensity histogram for an image with significant intensity inhomogeneity (uneven lighting) and some debris in the backgr ound as well as a single object of interest. ................................ ................................ ................................ 43 1 4 An intensity histogram for a blank image, comprised of intensity inhomogeneity and debris. ................................ ................................ ................. 43 1 5 An intensity histogram for a background subtracted image. ............................... 44 1 6 A portion of a fluorescence image is cropped to display pixel noise. .................. 44 1 7 An intensity histogram of pixel noise fit with a Gaussian distribution of the same mean and standard deviation as the noise sample has an R 2 value of 0.9498. ................................ ................................ ................................ ............... 45 1 8 A QQ plot further demonstrates the Gaussian nature of sampled pixel noise. ... 45 1 9 The Airy disk is shown, cut at 2.5% of its maximum intensity to better visualize the outer rings. ................................ ................................ ..................... 46 1 10 A simulated disk shape (left) convolved with a narrow Airy pattern (middle) and a broad Airy pattern (right). ................................ ................................ .......... 46 1 11 A 2 D Gaussian curve is often used to approximate the Airy pattern, the theoretical PSF for diffraction limited microscope systems with circular aperture. ................................ ................................ ................................ ............. 47 1 12 Photobleaching can quench the fluorescence signal from a sample, making certain cellular features difficult to extract. ................................ .......................... 47 2 1 Manual cropping of images to assess noise may yield drastically different segmentation results. ................................ ................................ ......................... 77 2 2 The histogram of pixel intensities at the periphery of an image reveals signals from fluorescent cells that could skew estimates of noise statistics. ................... 77
8 2 3 The mean and standard deviation of periphery pixels are skewed by cell signals at the edges of the image, but the median and median average deviation ar e more robust to outlying signals . ................................ .................... 78 2 4 A histogram of image intensities is compared to the estimated noise distribution (red line) derived from the expectation maximization algorithm. ...... 78 2 5 The normalized noise distribution (red) and signal distribution (blue) estimations derived from the expectation maximization algorithm are plotted against the responsibility of the signal (black). ................................ ................... 79 2 6 Pixels corresponding to the cell fluorescence signal in the raw image (left) will have signal responsibility near one (shown in white). The red boundary curve corresponds to responsibilities greater than 0.95. ................................ .... 79 2 7 The normalized noise distribution (red) and signal distribution (blue) estimations derived from the expectation maximization algorithm are plotted against the responsibility of the signal (black) for a low SNR image. .................. 80 2 8 Low SNR images have less definitive responsibilities (top), while very high SNR images have responsibilities that are misled by error from the OTF (bottom). ................................ ................................ ................................ ............. 81 2 9 An intensity histogram of a 2 D Gaussian intensity profile is fit to the reciprocal function with R 2 = 0.9779. ................................ ................................ .. 82 2 10 An arbitrary sample of Gaussian noise is averaged in a 3x3 neighborhood to de noise the original sample (1) and produce pixel values closer to the mean intensity (2). ................................ ................................ ................................ ........ 82 2 11 A noisy raw image (top left) is convolved using a 3x3 averaging kernel (top right), a 7x7 averaging kernel (bottom left) and a 7x7 Gaussian kernel (bottom right). ................................ ................................ ................................ ..... 83 2 12 Wavelet analysis (top) of a sinusoid ( ) reveals the period nature of the signal at shorter scales (higher frequency) that decays at longer scales (lower frequency) and breaks down at the pinch. ................................ ............... 84 2 13 The elements of the Haar and Daubechies2 wavelet family are organized by decomposition (top rows) and reconstruction (bot tom rows) wavelets, and divided between low pass (left) and high pass (right) filter characteristics. ........ 85 2 14 A color photograph (topmost) is subjected to a single level of Haar wavelet decomposition to produce a down sampled approximation (middle left) and three high pass images . ................................ ................................ ..................... 86 2 15 The total residual between the original and Haar wavelet deconstructed then reconstructed images is negligible. ................................ ................................ ..... 87
9 2 16 An intensity histogram of a fluorescence image (inset) is subjected to the expectation maximization algorithm to extract an estimated pixel noise density (red line). ................................ ................................ ................................ 87 2 17 The intensity histogram of a first level horizontal high pass Coif3 wavelet decomposition reveals significant signals outside of the natural range of pixel noise variation of 3 standard deviations (red lines). ................................ ............ 88 2 18 The black pixels (left) represent the absolute signal greater than 3 standard deviations of pixel noise within the first level horizontal high pass coif3 wavelet decomposition. ................................ ................................ ...................... 88 2 19 A typical noisy digital image of cell fluorescence (left) is compared to the result of coiflet3 wavelet de noising, using 3 standard deviations as a high pass threshold before image reconstruction. ................................ ...................... 89 2 20 A typical digital image of cell fluorescence (left) is convolved by an averaging filter (right). ................................ ................................ ................................ ......... 89 2 21 The residual between a raw image and a low pass result contains the remaining high frequency content of the image, including object edges and pixel noise. ................................ ................................ ................................ .......... 90 2 22 The sorted pixel intensities of a residual between a raw image and a low pass result (black) are plotted against the sorted pixel intensities of a synthetic noise image constructed based on the residual nois e statistics . ......... 90 2 23 The mean squared error between pixel values of a sorted residual image for pure noise and a sorted synthetic noise image scales with the variance of the residual noise. ................................ ................................ ................................ .... 91 2 24 For pure noise, the standard deviation in dinosort output versus the raw image standard deviation consist ently demonstrates a noise reduction of approximately 34% based upon a histogram of the noise reduction fraction. ..... 92 2 25 Compari son between an original residual image (left), an unscaled rebuilt sorting result (middle) and a scaled rebuilt sorting result (right) demonstrates the significant reduction of noise along with retention of high pass content. ...... 92 2 26 A test image was created using a binary disk shape with a 25 pixel radius and additive Gaussian noise with a zero mean and unit variance (left). ............. 93 2 27 Expectation maximization using the Gaussian mixture model does not have the statistical resolution to distinguish pixel i ntensities from the background and the object in a test image. ................................ ................................ ............ 93
10 2 28 The noise reduction fraction versus computation time for Ga ussian blurring with a 3x3 kernel (dots), dinosort (pentagrams), and anisotropic diffusion using Lorentzian (+), Leclerc (x) and T Ã¼ key (triangles) influence functions . ....... 94 2 29 The result of several de noise algorithms after the indicated number of iterations on the test image were selected based on similarity in noise reduction fraction. ................................ ................................ ............................... 95 2 30 Segmenting the de noise result of several algorithms demonstrates the retention of high spatial frequencies for dinosort after two iterations based upon smaller holes in the space of the object. ................................ .................... 96 2 31 Digital images may contain low spatial frequency sources of error that appears as glare local ly near region of brighter cell signal and prohibits accurate segmentation despite de noising. ................................ ........................ 96 2 32 The Jansson Van Cittert alg orithm attempts to iteratively remove low spatial frequencies, but may generate false boundaries after segmentation. Expectation maximization was used to threshold th e convolution result . ........... 97 2 33 Wiener filtering attempts to find the optimal least squares solution to the classical imaging equation. Expectation maximization was used to threshold the convolution result (red curve). ................................ ................................ ...... 97 2 34 Regularized functions developed to deconvolve images and avoid negative values in the result. Expectation maximization wa s used to threshold the convolution result (red curve). ................................ ................................ ............ 98 2 35 The Lucy Richardson algorithm models noise as a Poisson distribut ion and finds the maximum likelihood solution to the classical imaging equation. Expectation maximization was used to threshold th e convolution result . ........... 98 2 36 Blind deconvolution finds the maximum likelihood solution to the classical imaging equation without the need to input a specific guess for the PSF. .......... 99 2 37 The preprocessing step of automated spatial filtering calculates a low pass filter size by finding the radius of a circle of equal area to a rough segmentation result. ................................ ................................ ........................... 99 2 38 Automated spatial filtering employs a low pass kernel to convolve raw images. A 2 D Gaussian (left) is used in practice, but the wedding cake filter (middle) also generated good results. ................................ .............................. 100 2 39 Convolution by a low pass kernel generates a blurry ima ge (left), and subtraction from the raw image generates a high boost result (right). .............. 100
11 2 40 Automated spatial filtering without de noising (left) and with the dinosort algorithm applied beforehand (right) generates very accurate cell boundaries after segmentation by a simple threshold . ................................ ........................ 101 2 41 Pixel intensity histograms before (top) and after (bottom) ASF processing show the same intensity threshold (red vertical line). ................................ ....... 102 2 42 The intensity of the raw image (black) along the discovered boundary after ASF demonstrates large variation in pixel intensity due to low spatial frequency information in the image. ................................ ................................ .. 103 2 43 variance of pixel values above and below the threshold. ................................ .. 103 2 44 An intensity histogram demonstrates the result of the Ridler Culvard thresholding method, with the corresponding segmentation inset. ................... 104 2 45 of a Gaussian (Log) and Canny filters generate piecewise boundary results. The Marr Hildreth algorithm finds the zero crossings of a LoG filter to detect objects . .... 104 2 46 The level set algorithm was initialized either by a thresholding (top left) or drawing a box (bottom left). ................................ ................................ .............. 105 2 47 A fusion of level sets and the ASF algorithm significantly enhances the ability to detect cell boundaries, but does not outperform ASF alone. ........................ 105 2 48 A heart CT sc an was initialized for level set segmentation using a box (top left) and converges on the heart chambers after roughly 7 seconds and 150 iterations (top right). ................................ ................................ .......................... 106 2 49 Stacking the segmentation results after repeated application of the ASF algorithm on an image of the cytoskeleton (top left) using different rescaling va lues generates a sharpened image (top right) . ................................ ............. 107 2 50 Stacking the segmentation results after repeated application of the ASF alg orithm on an image of the actin cytoskeleton (top) using different rescaling values generates a sharpened image. ................................ .............................. 108 3 1 The disp lacement between centroids of block pieces are measured for the addition of a square block both vertically and diagonally. The displacement is different depending on the original orientation of the block piece . .................... 131 3 2 Principal component analysis is used to inscribe a maximal ellipse into a cell boundary. The eigenvectors of the boundary coordinates form a rotation matrix that rotates the boundary, aligning the major axis with the x axis. ......... 132
12 3 3 Inscribing a maximal ellipse partitions shapes and can hel p identify the number of protrusions as well as aid tracking by centroids of the partitions. .... 132 3 4 16 NIH 3T3 fibroblasts have been partitioned by their maximally inscribed ellipse, demonstrating the range of protrusions this cell type can achieve. ...... 133 3 5 Using additional shape information may help to understand the behavior of migrating cells in addition to displacement information. ................................ .... 133 3 6 Segmentation of a cell (top) performed by simple thresholding using noise boundaries, which can drastically skew the distribution of cell parameters . ..... 134 3 7 The circularity for basic shapes is presented. ................................ ................... 135 3 8 Cell shape parameter distributions are significantly skewed by bad image segmentation, even for visually acceptable results. ................................ .......... 136 3 9 The symmetry differences of NIH 3T3 fibroblasts (top) and those treated with TIAM1 shRNA (bottom) are demonstrated using 2 d contour plots of the distributions. Treatment by shRNA influenced a decrease in symmetry. .......... 137 3 10 The distance between the cell centroid and the centroid of a maximally inscribed ellipse was normalized by the major axis length . .............................. 138 3 11 The distribution of circularity and eccentricity for single migrating NIH 3T3 fibro blasts changes upon treatment by miR 34a, and plasmids for Rac1 V12, RhoA T19 or Cdc42 T17. ................................ ................................ .................. 139 3 12 Circularity and Eccentricity can generally classify differences in cell populations based on cell type or after drug treatment. ................................ .... 140 3 13 Circularity and Eccen tricity can generally classify differences in cell populations based on cell type. ................................ ................................ ........ 141 3 14 The three colored circles indicate rounded (orange), elongated (blue) or complex (purple) cells. These divisions do not take advantage of the complete in formation a set of boundary coordinates provides. ......................... 142 3 15 Very rounded cell types may be very difficult to classify using circularity and eccentricity, causing a bias in the ability to analyze their behavior versus other cell types. ................................ ................................ ................................ 142 3 16 Circulari ty and Eccentricity can generally classify differences in cell populations based on differences in the protein activity of key cytoskeletal proteins. ................................ ................................ ................................ ............ 143
13 3 17 ANOVA of the circularity, eccentricity and number of protrusions for several cell types and treatment conditions do not reveal the expected significant differences noted visually between single cells from each group. .................... 144 3 18 A live NIH 3T3 fibroblast adopts many configurations as it moves along a trajectory, but they are visually similar and their shape parameters do not reflect differences between subtle shapes changes. ................................ ........ 145 3 19 Minimal geodesics represent the shortest paths in t he curved space of a manifold. The tangent space near a point on the manifold resembles flat Cartesian space and is useful for calculating geodesic paths. ......................... 146 3 20 Each pixel represents the minimal geodesic distance between two cell shapes, with the diagonal representing self comparisons. ............................... 147 3 21 The geodesic distance matrix can be transformed to a similarity matrix for use in the normalized graph cut clustering algorithm using the Gaussian function. Brighter pixels represent similar shapes. ................................ ........... 148 3 22 Three clusters (right) are rapidly computed using normalized graph cuts for a 183 x 183 similarity matrix (left). ................................ ................................ ....... 149 3 23 The clustering (blue, green or red) result based on geodesic distances provides greater resolution than traditional shape parameters, even for very subtle differences in the cell boundary. ................................ ............................ 150 3 24 The calculation of a mean shape using geodesic distan ces and tangent vectors functions identically to this example using 16 random points. .............. 151 3 25 The mean shape of each cluster can be calculated using the shape manifold to provide potential insight into cell migration behavior. ................................ ... 152
14 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy THE MULTIDISCIPLIN A RY PERSPECTIVE OF CELL SHAPE By Stephen Hugo Arce August 2014 Chair: Yiider Tseng Major: Chemical En gineering The role of i maging in medicine, drug discovery and fundamental cell biology research is largely to qualitatively verify the results of chemical assays. In this secondary capacity, human judgment or computer aided analysis techniques are utilized t o identify features in images that signify the presence of disease, drug response or other significant changes from a control condition . Full automation of image analysis has two potential benefits: to increase the turnover of analysis and ge nerate diagnostic data more rapidly, but more importantly to extract quantitative details about patient samples and experimental specimens. Via statistical analysis , i mage based quantitative parameters have the power to reveal subtleties that the human eye cannot detect, and they may also improve the ability to compare and standardize images used for diagnostic and research purposes. This work promote s new method s to overcome traditional obstacles to automated image analysis of cells expressing fluorescence proteins while improving the detected cell boundary. Methods to attenuate the influence of Gaussian distributed pixel noise are discussed as well as a spatial filtering technique that removes blurring effects from brighter regions in images that can signi ficantly skew local cell boundaries. Measurement of shape in response to drug treatment or other
15 molecular stimuli has provided the foundation to accurately identify the critical ce ll features present or absent given the stimulus . Those features can help t o establish a cell shape manifold to quantitatively describe dynamic cell behaviors and capture them with their intrinsic mean , rather than by non representative extrinsic means of parameters projected onto a Cartesian space . In that sense, changes in behavior that manifest physically due to some perturbation can be represented as geodesic paths on the manifold with efficiency relative to the Riemann distance between those states . Thus, seemingly redundant cell responses can b e distinguished by their geodesics , helping to explain the complexity of cell behavior in processes like wound healing and metastasis. Together with automated image analysis, a platform to improve the diagnostic capability of physicians is one potential tr anslational application of this work. The general framework could have a broad range of other applications in computer vision and military defense.
16 CHAPTER 1 INTRODUCTION This dissertation took root while many Chemical Engineering departments were progressively transitioning to include a Biomolecular/Biomedical direction in their academic mission. This transition period held many challenges since funding for biology related research has so firmly been awarded to purely medical/biological institutions, but those challenges also guided the Brownian motion of many multidisciplinary researchers, trying to gain a foothold in a hybrid field. The pa th of this document was motivated by the same guiding forces, beginning by addressing purely biological problems and adapting the available engineering tools to create novel solutions. Often, a discrepancy in the language and training of two previously dis parate fields of research provided obstacles to progress, but then led greater clarity and ability to communicate previously inarticulate ideas . The organization of this work begins with the consideration of cells and the molecular factors that contribute to their migration, structure and pathology. These founding ideas motivated the investigation of image processing and computer vision as necessar y tools to quantify the previously qualitative observations of cellular behavior. The remainder of the introduction chapter attempts to introduce these ideas in clear language and sufficient detail that a reader with a background in either engineering or b iology could find comfortable and still necessary to grasp the later direction. Chapter 2 offers more detail about image processing, introducing the brief history of the field before introducing two new methods for handling pixel noise and low spatial freq uency error intrinsic to fluorescent micrographs of single cells. On their own, these techniques could directly support medical imaging and various other computer vision applications,
17 but they also lead to the concepts in Chapter 3 due to their ability to accurately quantify the boundary of single live cells. Thus, the measurement of cell parameters is discussed, first to suggest that the quantification of cell shape is related to the underlying molecular activity driving cellular architecture and migration behavior, and then to demonstrate that new mathematical tools exist that may be able to more precisely characterize cell heterogeneity. Ultimately, the conclusion and recommendations tie together the themes from biomedical questions, image processing effo rts and quantitative cell shape analysis to suggest a vital role for this multidisciplinary strategy for future research. Even so, the engineering tools contained herein have a much greater variety of uses than specific biomedical applications; hence, t hro ughout the document, an emphasis on explanation is maintained. Thus, the more vital contribution of this dissertation is that it may serve as a Rosetta stone to aid the translation of ideas between fields of research and foster the fusion of disparate fiel ds by providing a reference and foundation for new generations of multidisciplinary students. Motivation Cells are the fundamental building blocks of higher organisms. Cell shape and adhesion to the extracellular matrix and neighboring cells is a huge factor in the hierarchal organization of tissues and organs (Bissel et al., 2005). In the clinic, physicians regularly perform visual analysis of biopsy samples, histological slides or medical images to note the differences from normal physiology (Shah, 20 09). However, the protocols to detect molecular biomarkers from patent samples through traditional molecular biology techniques were firmly established by the 1970 s. In contrast, computers and cameras did not have the sophistication to be clinically useful until
18 comp aratively much later, in the 80s and 90 s (Duncan, 2000). The discrepancy in development has been enough to push computer vision and image analysis into a secondary role behind molecular techniques. Imaging and image analysis has an unlocked pot ential to drastically change diagnostics and patient care given the proper support and development (Bickle, 2008; Beck, 2011; Rimm, 2011). This potential is already being realized in the realm of fundamental research, which has recently witnessed the emerg ence of super resolution techniques such as STORM and PALM (Schwartz, 2011). The fine structures of cellular features are typically obscured because their size is below the optical limit of detection by light, but through repetitive measurements of point s ources of light along such structures, the centroids of each small signal can be reconstructed into a histogram that appears to be a beautiful, high resolution image of the desired structure (Huang et al., 2010). However, information at that length scale i s difficult to translate into a medical application and the cost and limited availability of the hardware prevents super resolution techniques from reaching smaller facilities. The likely candidate that could greatly impact medical imaging is fluorescence microscopy due to its low cost and widespread availability (Pepperkok et al., 2006). The emergence of the CCD chip and modern computers and storage combine to create a powerful package to rapidly gather and analyze massive amounts of data (Thomas, 2009). T his framework is already at work in the field of drug discovery to a certain extent (Clayton et al., 2006). Cells are cultured onto multi well plates and subjected to drug treatment. After incubation and the introduction of fluorescence, the cells are fixe d and imaged. The shear amount of images necessitates that subsequent imaging and
19 statistical analysis is extremely streamlined to promote high throughput and gather enough data to provide sufficient evidence of drug effects (Miklos, 2004). At this stage, fluorescent images are segmented using basic rather than state of the art techniques to foster speed, and then the binary masks are analyzed for the geometric properties they exhibit (Thomas, 2009). Yet, poor segmentation generates a high amount of undesir ed variation in subsequent shape analysis, and more sample size is required to reach a conclusion about the drug effect, which is counteractive to the high throughput mission (Wang et al., 2008). The error from segmentation arises because identifying the s hape of cells is very, very difficult to automate, even with state of the art methods (Muzzey et al., 2009; Waters, 2009; Dzyubachyk et al., 2010). Cell behavior is highly dynamic and generates a large amount of heterogeneity in biological samples (Yuan et al., 2012). This translates to a variety of image features that segmentation methods must overcome to accurately detect cells (Yao et al., 1997; Waters, 2009). This has influenced many researchers to rely on a high amount of user interaction to gather the information they desire from cellular imaging (Bakal et al., 2007). Openly available software packages like ImageJ and CellProfiler contain a variety of scripts and plug ins that users can string together and tweak to achieve reasonable segmentation (Carp enter et al., 2006; Collins, 2007). At the extreme, a user will click several points on the image to roughly describe the cell boundary before moving on to analysis (Bakal et al., 2007). The more sophisticated segmen tation methods like graph cuts and level sets are now available, but they may require extensive training and instruction, can be computationally expensive and slow to run or have a combination of those undesirable qualities (Shi et
20 al., 2000; Ray, 2002; Cour, 2005; Chen et al., 2007). In the end , they also require users to guess and check algorithm parameters, which must be adjusted for each image ad hoc ( Li et al., 2011 ) . Segmentation only encompasses the first step in the full characterization of cell behavior by image analysis (Zhang et al., 2 004). The measurement of cell parameters like length, area and number of protrusions is the next stage of the progression. However, current computer aided diagnostics use unexpected image parameters like the total background intensity and the number of bri ght spots in cellular regions, which possess no straightforward physiological meaning (Yuan et al., 2012). Those quantities are analyzed together with other parameters to extract the presence of disease. Often spectral qualities of the dataset are used tha t are even more difficult to assign to any physiological characteristic in the original sample, and the final conclusion may be difficult to explain to patients, despite having a solid statistical foundation. The analysis of physiologically relevant cell f eatures has provided the inspiration to the majority of this work. At the most basic level, a change in cell behavior in fluorescent images can be represented as a deformation of the cell boundary (Machacek et al., 2006). Adding pure ethanol to a dish of l ive cells will certainly kill them after some time, which can be easily visualized as cells retract into themselves and undergo apoptosis, a form of programmed cell death (Janes et al., 2005). Thus, one can describe the effect from ethanol in a progression of cell boundary deformations, detectable by a series of images and proper segmentation. Under that simple framework, the shape of cells in response to some perturbation was quantitatively measured and compared to normal cell behavior under controlled con ditions.
2 1 Before getting more into the dynamic behavior of cells, it is important to mention that shape, in general, should disregard the scale, orientation or position within an image (Krim et al., 2006; Gonzalez, 2007). It is easy to imagine that car tire s all have the same circular shape when viewed from the side, and while viewing a picture of cars in a line, tires will be present throughout the image but share the same shape. Using that same example, one can imagine a series of images slowly moving arou nd to the front of the tire. Then, the shape from that front perspective seems more rectangular to the viewer. Thus, a progressive deformation of the tire shape from a circle to a rectangle will be observed, which represents how the shape changed based on the viewer perspective. With shape in mind, the response of cells to any molecular perturbation might be captured through image analysis, so long as the molecular change manifested physically (Aldridge et al., 2006; Beck, 2011). Thus, this work relates clo sely to the cytoskeletal organization, which imparts structure on the cell (Chen et al., 1997; Nakaseko et al., 2001). The nucleus is also several times more rigid than the cytoplasm, has its own nucleoskeleton as well as chromosomes and other cell bodies and can significantly affect cell shape (Lee et al., 2007; Rafelski et al., 2008). The following sections will describe some factors that could influence the cytoskeleton and nucleus positioning once presented to cells, which will be considered later when attempting to assign physiological relevance to a cell shape parameter measurement . GTPases The dynamics of cell migration comprise several types of chemical and physical factors (Kole et al., 2004) . For instance, it is known that a massive amount of prote ins is involved in a complicated signaling network to promote cell migration (Lee et al., 2006;
22 Machacek et al., 2009) . At the downstream of those signaling, the cytoskeleton undergoes a dynamic re organization process, mainly involving the Rho GTPase fami ly proteins, RhoA, Rac1 and Cdc42, to facilitate cell migration (Hall, 1998; Arthur et al., 2001; Ridley, 2001; Lee et al., 2005) . For example, for mesenchymal type cells, Rac1 and Cdc42 work in conjunction to organize the actin cytoskeleton into lamellipo dia at the leading edge of the cell and build dense stress fibers to anchor and stabilize the cell as it moves (Nobes et al., 2005) . At the trailing end of the cell, RhoA coordinates actomyosin contractions to detach adhesions and allow the cell to progres s forward (Rid et al., 2005) . Recently, nuclear translocation and rotation has been studied in non neuron cell types. Lee et al. (2005) indicate that the nucleus rotation in single fibroblasts is a highly and PAR6 . In addition, this study further identified that a pathway, involving Cdc42, MRCK, myosin and actin, regulates nuclear translocation (Khatau et al., 2009) . I t is possible that nuclear translocations due to changes in cytoskeletal regulation for d ifferent cell migration strategies might be distinguishable from one another using image analysis . D rugs or other treatments that impede actomyosin contraction can disrupt the coordinated rearward movement, demonstrating that these structural elements are necessary for rearward nuclear translocation to occur and a mechanical process must exist to properly move the nucleus (Rid et al., 2005) . Hence, the development of new methods that can connect the individual protein functions but emphasize on the overall cellular behavior outcome of those proteins is needed to study cell behaviors such as cell migration. Since actomyosin contraction plays an essential role for the mesenchymal t ype of cell
23 migration, the nucleus might serve as a major anchorage site for a migrant cell to pu ll out its substrate adhesions via the actin cytoskeleton at the trailing edge of the cell (Rottner et al., 1999) . From this point of view, the e ffect of any m olecular factors on cell migration can be interpreted by the coupled activities between the Rho GTPases in single cells. This concept drives the pursuit to monitor and analyze individual cytoskeletal remodeling events that are independently controlled by R ho GTPases activity. MicroRNA Recent studies have re vealed a class of short (~ 23 nucleotides in length) , non coding RNAs, called microRNAs (miRNAs), that are deeply involved in cellular activities (Stefani et al., 2008; Migliore et al., 2009; Inui et al., 2010) . The miRNA expression profile is unique among different cell types (Poy et al., 2004; Lu et al., 2005; Gottardo et al., 2007) . Functionally, miRNAs bind to messenger RNAs using a short (~8 nucleotides) recognition sequence to interfere with or halt protein expression. It is predicted that a single miRNA can regulate a subset of more than 100 different proteins and currently more than 550 different miRNAs have been identified in the human body (Lim et al., 2005) . T he expression profiles of miRNAs have been used to successfully classify cancers and trace the tissue of origin for metastatic cancers of an unknown primary origin (Rosenfeld et al., 2008). It is known currently that two different microRNAs, miR 10b and miR 221 can change the cell migration s peed based on wound healing assays (Baranwal et al., 2010) . Yet, it is also extremely difficult to incorporate the new regulation mechanism of miRNAs into traditional signaling pathways because proteins within the subset of a given miRNA are not limited to specific pathways, but can broadly affect several molecular pathways, and propagate
24 signals to alter the expression of other miRNAs (Filipowicz et al., 2008; Krichevsky et al., 2009) . I t has been demonstrated that miRNAs are deeply involved in all the bio logical processes through molecular signaling pathways, including proliferation, differentiation, migration, apoptosis, metabolisms and immune response as well as in various diseases states, such as cancer, neurological disorders, muscular dystrophies, vir al infection and diabetes (Volinia et al., 2006) . For example, the activation of the transcription factor encoded by the p53 tumor suppressor gene by cell stress can directly elevate the expression level of miR 34a, which can trigger a major cell event, in cluding G 1 phase cell cycle arrest, cell senescence or apoptosis (He et al., 2005; Hermeking, 2007) . In addition, miR 21 is known to possess feedback regulatory mechanisms and aberrant miR 21 expression is linked to cancer cell invasiveness and metastasis through several signaling pathways in different organs and tissues (Meng et al., 2007; Krichevsky et al., 2009) . A recent study in dilated cardiomyopathy has further shown that most nodal proteins of the predicted targets for miRNAs of which the expression level are altered in human heart failure samples are confirmed to be the key regulators of cardiovascular signaling pathways (Eisenberg et al., 2007; Prasad et al., 2009) . A wealth of new therapeutic targets for colorectal cancer (CRC) pathogenesis was g (Vogelstein et al., 2013), but d ue to the novel activity of miRNAs, their genes can also play a role similar to tumor suppressors or oncogenes. Since miRNAs can induce a significan t bifurcation of multiple signaling networks and their activities have been linked to almost all the cellular events, it is highly expected that miRNAs play a role as master
25 regulators in that they could be the switches triggering physiological and patholo gical events. However, we currently lack the capacity to directly and precisely describe the activity of individual miRNAs on cells to understand to wh at extent miRNA regulation manifests as cell phenotype change s (Croce, 2009) . Presently, methods to modul ate miRNAs are commercially available and our ability to accurately monitor the physical parameters of single cells within complex microenvironments is continuously being advanced (Gillette et al., 2008; Domachuk et al., 2010). Backed by these developments , q uantitative biophysical measurements of miRNA mediated changes in cell behavior for a given miRNA and cell type c ould reveal detailed information about the role of miRNA as a regulator of cell migration behavior. E pithelial to Mesenchymal Transition One widely accepted model of cancer development tells a story of a progressive accumulation of mutations that slowly influence the transition of healthy cells into benign tumors and finally into highly aggressive, invasive tumor cells (Gupta et al., 2006; Yang et al., 2008) . On a case by tumor phenotypes that develop over different timelines and present different drug resistances. Yet, there are some overarching features of cancer that are of crit ical importance. Nearly 90% of metastatic tumors have epithelial origin, and upon the onset of metastasis, 5 year survival of patients drastically decreases (Siegel et al., 2013) . A process called epithelial to mesenchymal transition (EMT), which typically describes signaling and morphology changes associated with tissue development and wound healing, has been identified as a likely mechanism for epithelial derived tumors to become invasive and acquire increased plasticity and drug resistances (Wolf et al., 2003) . Oncogenic EMT, then, demonstrates cooperation between progressive gene
26 regulation changes and tumor phenotype evolution that directly results in the development of dangerous, aggressive cancers (Hordijk et al., 1997; Onder et al., 2008) . However, c onsistent with the gradual nature of cancer development, phenotypic changes that coincide with progressive mutations might be too subtle to detect by subjective visual inspection, or the true invasive potential of tumor cells could be hindered by proximity to the colony that was detected and t argeted by the biopsy procedure . Therefore, it is a great challenge to characterize EMT. In general, EMT is stimulated by growth factors, such as TGF Wnt, presenting in the neighboring microenvironmen t of the cells (Nieto et al., 2002; Nawshad et al., 2005) . These growth factors activate transcription factors, including Snail, Slug, Twist and ZEB1, which play critical roles during proliferation, migration and apoptosis (Hajra et al., 2002; Eastham et a l., 2007) . However, oncogenic EMT has been designated as its own category of EMT apart from developmental or wound healing related EMT due to some key signaling differences and morphological signatures. Notably, the reduction of surface E cadherin has been found almost universally among invasive tumors (Conacci Sorel et al. 2003; Kim et al., 2005) . It is widely known that t he TGF signaling plays a major role in the process of EMT to suppress epithelial function while promoting mesenchymal activities. The introduction of TGF to cultured epithelial cells has been shown to induce EMT in vitro for many different epithelial cell types (Walsh et al., 2011) . Using near confluent cell cultures, various markers of EMT may be detected after several hours or days u pon introduction of TGF escaping a cluster of neighbors must begin by reducing cell cell contacts mediated by E cadherin. In the physical sense, reduction of surface E cadherin activity translates to
27 fewer cell cell connections, and it serves as an initial step before subsequent morphological changes. However, little is understood about lone cells experiencing the process of EMT. While drastic cytoskeletal reorganization must occur to facilitate a shift in cell polarity, the transfor ming cells also evolve to adopt different adhesion strategies to interact with the extracellular matrix (ECM) as cells become more migratory (Watanabe et al., 2009) . This transition process is contributed by major cytoskeletal remodeling. Yet, the connecti on between EMT involved molecules and the Rho GTPase s (mainly RhoA, Rac1 and Cdc42) have merely been suggested through signaling crosstalk by the PTEN/Akt pathway (Wu et al., 2003) . In terms of phenotype, aggressive cells have shown the ability to adopt s everal migration strategies and even the capability to shift to a new strategy in the presence of some type of roadblock (Friedl, 2003) . Hence, single cell based assays have a greater potential to help explain the relationship between gene alterations and subsequent phenotype qualities. Biophysical cell parameters that fully characterize migration and morphology are not currently implemented in the diagnostic process; yet the most deadly aspect of cancer is physical in nature. Merging phenotypic and molecul ar information to explain invasiveness will open new avenues of diagnosis, and a standard guideline for pathology can be built while computer aided pathology can be directly promoted since it is quantitative and could be easily automated. Multidisciplinary Perspective Presented with a massive amount of genomic and proteomic information, the boundaries between traditional medicine and computer science are being shattered (Yuan, 2012) . Complex learning algorithms are being applied to sort through very large d ata sets to understand the underlying signaling networks of cells. A shift to support so -
28 called translational medicine has emerged in recent years in hopes to merge fundamental research ideas with patient care through medical physics, engineering and up to date software and communications. Even the digitization of old paper records containing patient data represents a combination of effort between physicians and computer scientists. Each of these examples demonstrates a paradigm shift toward multidisciplina ry research , which is broadly defined as a dual effort between two previously disparate scientific fields. In terms of traditional biology, imaging has played a major role for many years, but in order to update image processing for fundamental research and eventually medical applications, a multidisciplinary approach is warranted. Such efforts have difficulty obtaining scientific exposure because they fall between the cracks of major peer reviewed journals that focus on a single subject, but this is a probl em that has largely been recognized and new funding mechanism are being created to support multidisciplinary goals. This work represents a multidisciplinary approach to answering very important questions about cell shape. Fundamental biology is necessary to prepare samples, microscopy is necessary to properly image specimens, image processing is required to extract accurate information from obtained images and mathematics is essential to fully utilize the quantitative parameters within data sets. Due to th is range of disciplines, the previous topics in the introduction represent only part of a full story . The remainder of the introduction shifts the focus from cell biology into engineering to provide the basic information on digital ima ges required to inter connect the multidisciplinary theme of this work.
29 Background on Digital Images Digital Images come in several different formats. Depending on how the image was acquired, compressed, converted, stored or read, the information in the image may be represented in many different ways. This chapter describe s some basic knowledge about digital image formats , explain s some common traits of digital images , and image acquisition . Digital Image Format The format of a digital image determines how numerical intensity information at each pixel is stored in bits. Modern cameras might capture 16 bit images, meaning a pixel value is stored using 16 bits and can take on a range of values spanning 2^16 from 0 to 65535 arbitrary units, or au. Similarly, an 8 bit image will ha ve pixel values ranging from 0 to 255 au . Those pixel values reflect the amount of light detected by the camera during an exposure. This information is often contained in separate channels for color images , but grayscale images can also be in a multi chann el format if each channel contains the same information. Some image processing packages automatically handle importing different image formats for the user, which might be confusing when first learning how to handle digital images and could lead to early r oadblocks in image analysis and visualization . Since the majority of the work described in this dissertation was performed through console commands and custom scripts in MATLAB (The Mathworks), that will be the perspective of the remaining discussion about working with digital images of various formats. The default function to rea d an image in MATLAB is imread . This function is overloaded to be able to accept many types of image formats as an argument , and automatically chooses the class of variable to outp ut the image information into the
30 workspace. Figure 1 1 shows the syntax to load three images using imread and check their class using the whos command. One .jpg and two .tif images are loaded as the variables a, b and c, resulting in two three dimensional unsigned 8 bit integer matrices, and a two dimensional 16 bit integer matrix. The first two dimensions reveal the size of the image and the third indicates the number of color channels. The fourth channel in the variable b contains the alpha information a ssociated with image transparency. As demonstrated by the differences between the two loaded .tif images, the format of an image does not necessarily tell everything about how intensity information is stored, but the class of variable can. Single channel, unsigned 16 bit TIFF images were the preferred format to capture both light images and fluorescence signals during the majority of this work. Background Subtraction Taking a picture in the absence of a specimen results in a blank image. M ajor source s of error for image analysis are debris in the field of view or uneven illumination. Ideally, debris will remain stationary and une ven lighting remains consistent for a series of images so that a blank image will capture the intensity information of those features. Then, after the specimen is placed in the field of view and pictures are taken, the pixel intensity values in the blank image can be subtracted from those in the experimental images. Thus, the visual influence of debris and uneven lighting can b e effectively removed by subtracting a blank image. Generally, this procedure is called background subtraction because the features within the field of view , which are not of interest , can be considered as part of the background. After background subtracti on, o bjects of interest are more easily identified.
31 B ackground subtraction can cause some pixels to attain negative values during the calculations, and since the format of the image is an unsigned 16 bit integer, the computer will round all negative value s to 0 au. Similarly, it would round all values exceeding 65535 to 65535 au. Later on, performing multiplication and division on the pixel values result s in decimals and fractions to be rounded down to the nearest integer value. To avoid these issues, it i s recommended to convert the pixel values from uint16 to double precision immediately after loading . Blank images are sometimes difficult to acquire due to experimental constraints, but if a time series of images are taken and the object of interest is dyn amic enough, a blank image can be constructed in a piecewise m anner from different frames . Figure 1 2 illustrates this process using the left and right portions of two frames in a time series (top) to create a blank image (bottom left), which is subtracted from other frames in the time series (bottom right). Small differences in the total intensity of each frame might be noticeable in the improvised blank image, but this effect can be overcome by finding the absolute mean difference of pixel values from eac h side of the blank; then subtracting that value accordingly from the pixels on the brighter side. This would work just as well for a cropped patch of an image if dividing frames by an entire side to create a blank is not feasible. The histogram s of pixel intensities for a raw frame, the improvised blank and the final background subtracted image s are featured in Figures 1 3 to 1 5. The signal from the background comprises the majority of the variation in the raw frame, but once it is removed via background subtraction, the resulting histogram features a Gaussian shaped profile and some low intensity pixels that correspond to the object of interest,
32 which are easily distinguishable by visual inspection. The presence of the Gaussian profile will be discussed i n the next section of this chapter. Noise Fluorescence imaging is often chosen over conventional light microscopy for its ability to specifically label certain features of a specimen. Since only the labeled structures of the specimen emit a fluorescence signal, the chance of debris or uneven illum ination is greatly diminished in the resulting image. However, for both light and fluorescence images, a phenomenon called pixel noise exists as another source of error for image analysis. Pixel noise is an effect of random fluctuations in the value of pix els around their true value. In the total absence of light, a CCD camera would still capture some signal s from each pixel due to thermal fluctuations of the molecules in the camera that get mistakenly interpreted as incoming photons. Quantum effects of in coming photons can also introduce a b it of variation to pixel values. V ariation in pixel values can also be introduced by electrical interference from nearby devices or from certain signal amplification techniques such as EM Gain , which intentionally apply an electric field to the incoming camera signal. Analogue to digital conversion must also approximate certain values of the incoming current to represent the signal in a digital format. The culmination of these factors ultimately manifest as small variati ons from an average pixel intensity value in an ideal blank image, free of debris and uneven illumination. The background subtracted image of figure 1 2 generated an intensity histogram with a Gaussian shaped profile (see Figure 1 5) that corresponds to t he pixel noise surrounding the object in the image. Figure 1 6 features a typical image of a cell expressing a fluorescent protein in its cytoplasm taken with a wide field fluorescence
33 microscope. The right panel shows a cropped portion of the full image t o better view the pixel noise present. T he general sandpaper like appearance of pixel noise in images is due to the random variation from one pixel to the next. Some camera systems add an offset value to incoming digital images to account for random intens ities that would attain negative values. This extra procedure prevents those pixels from getting truncated to 0 au by the machine. Regardless of the source, pixel noise is often modeled as additive Gaussian noise to the image signal. In other words, if yo u created a histogram of pure pixel noise, it would resemble a Gaussian di stribution of pixel intensities, where the mean could be considered the average background intensity and the variance would fully describe the level of noise in the image. Figure 1 7 shows a histogram of pixel noise intensity values fit by a Gaussian distribution using the mean and standard deviation of the pixel noise sample. The R 2 value of 0.9498 means that nearly 95% of the deviation from the Gaussian model is explained by the var iance of the sample, rather than poor fitting. Further support that the noise is n ormally distrib uted is seen in Figure 1 8 by the linearity of the QQ plot, which compares the quartiles of the pixel noise sample to that of a standard normal distribution . T his statistical quality of pixel noise is exploited in several different image processing techniques. For now, it is only important to accept that pixel noise will occur independently of the incoming light signal. Point Spread Function Even without the pre sence of pixel noise, the light that reaches the camera in a typical microscope system does not represent the true signal from the specimen. The path from the light source to the camera detector will contain a series of mirrors and lenses as well as the sp ecimen, and the medium that light must travel though such as
34 air or oil. Because of small imperfections and the wave nature of light, this path through any microscope system will alter signals in a consistent manner that can be described by the point spread function, or PSF . For a point light source, the PSF determines how the light from the true source will get diffracted and spread out within the eventual focal plane of the detector. Imagine throwing a rock into a pool of water and observing the ampl itude of the surface of the water as ripples reach the edge. The water will distribute distribute the energy of photons within a focal plane. (Zhang et al., 2007) Mo dern objectives and filters are oft en considered to be diffraction limited due to their high level of precision. Diffraction limited systems only have to worry about the wave nature of light when determining the PSF . In this instance, light in the focal pl ane of microscope systems with circular apertures will form an Airy pattern , which resembles a single bright spot surrounded by progressively dimmer rings of intensity. Figure 1 9 shows the Airy pattern below 2.5% of its maximum intensity to help display t he rings. This pattern also extends in the z direction, normal to the focal plane, but the distance 2 aperture of the objective and n is the index of refraction of the medium through which the light travels. Digital Image Resolution The minimum distance that two parallel lines are still completely distinguishable can quantify the resolution of an image . T he action of the PSF effectively blurs fine lines into each other so that they appear blended and contrast is lost. Hence, very small objects located less than about 200nm from each other become indistinguishable when imaged using visible light (blue light ) , and larger objects in images will lose their contrast
35 at edges because the intensity signal gets blurred as it passes through the microscope system. Sources of light outside of the focal plane will have even less resolution in digital images since the P SF is broader as a function of height. This b lurring effect can be described mathematically as a convolution of the true signal w ith the PSF. Figure 1 10 demons trates the blurring effect of convolution by an Airy pattern for a simulated disk image. To make subsequent computation easier, t he Airy pattern is often approxim ated by a 2 D Gaussian kernel . Figure 1 1 1 compares the central bright spot of the Airy pattern to its Gaussian approximation. I n the absence of significant pixel noise in an image, a de convolution process can be applied to reverse the negative affect of the PSF on a signal. Unfortunately, traditional de convolution is very sensitive to additive pixel noise, and often final resolution is sacrificed by applying a de noise algorithm before hand. Spatial Frequency The pitfalls associated with de convolution, de noising and image resolution led to the development of image processing methods that rely on the spatial frequency in images. Spatial frequency reflects how often the intensity signal of a feature varies across several pixels. While different methods are discussed in detail in the following chapter, the concept of spatial frequency is important to consider as it categorizes many image attributes into distinct frequencies. For example, p ixel noise is considered to have high spatial frequency since intensity values vary drastically from one pixel to the next. Intensity inhomogeneity (unevenly lit background) can be considered low spatial frequency since it gradually changes across the enti re image. Furthermore, complex objects like cells can be composed of several features with a range of spatial
36 frequencies. Small, thin protrusions will possess a higher spatial frequency than the bulky cell body. Hence, for fluorescence and also light micr oscopy images, the information within is a combination of many spatial frequencies, and this offers a unique perspective to tackle issues with noise and effec ts of the PSF . Background on Image Acquisition important to discuss the many options a technician has available that will determine how images turn out. Objectives and M agnification The numerical aperture was briefly mentioned in regard to resolution and the PSF , but not explained in detail. Numerical aperture is measured as the half angle of a cone of light hitting the objective coming from the sample multiplied by the index of refraction of the medium between the two. Since a high NA results in a narrower PSF, it is preferable to maximize it by having the objective as close to the sample as possible. For very high magnification lenses, special oil is placed between sample and the objective to increase the index of refraction fr om that of air and enhance the NA. Taking these steps helps maximize the resolution of images so that the camera can capture very fine details in a specimen. In biological applications, the scale of cells must be considered. The highest magnification of th e microscope system might not be preferable if too much of the cell structure is cut out of the field of view, or if only one cell is observable in each frame. Migrating cells will sometimes move out of the field of view under higher magnification as well. This introduces a trade off between optimal resolution and observation capability. At lower magnifications, very fine cellular structures such as filopodia might
37 become unresolved, but the bulk dynamics of the entire cell can be captured. Lower magnificat ion also allows images to capture many cells at once, providing greater sample size for measurement and statistics. Fluorophores Many types of fluorescent probes and labels have been developed since the initial discovery of green fluorescence protein. Thes e molecules absorb light at a certain wavelength and then emit a different wavelength of light. The photoefficiency of the molecule determines how well it converts energy from absorbed light into emitted light, and is a factor in deciding on the intensity of the light source. A better photoefficiency will emit more photons and generate brighter images for the same light source intensity. This is preferable since UV light at 100% intensity can damage living cells and cause photobleaching damage to fluorescen t probes. To avoid damage to cells, light is filtered to 12 25% of its full intensity, but photobleaching always occurs , meaning that the fluorophore no longer emits light from the sample . Less intense light on ly slows the rate of photodecay so that for v ery long experiments the apparent brightness of the specimen will diminish exponentially over time to eventually become undetectable. Commercial beads typically have a very slow rate of photobleaching, but other molecules like GFP perform less well. As the sample gets progressively dimmer from photobleaching, the level of noise typically stays consistent so that the signal to noise ratio steadily decreases. A low SNR makes object recognition very difficult or impossible. One hazard from photobleaching is p resent even before the experiment commences, when technicians must select positions within a sample and focus the camera. Leaving the UV light on during this period of time will decrease the maximal
38 signal from fluorophores due to photobleaching. This is seen in Figure 1 12: the initial frame (left) was selected for observation, but photobleaching during sample preparation results in dimmer captured images (right). In practice, it is helpful to use signal ampl ification features like EM gain and max imal binning to quickly scan through samples with the UV light source off. At a new position, a snapshot can be taken to judge whether enough objects of interest are in the field of view, and if so, the light source can be switched on to perform focusing a s quickly as possible. After focusing is complete, the light source can be switched off again before seeking a new position and repeating the process until enough experimental points are set. The emitted light of a fluorophore also determines the maximum r esolution of the final image with blue light providing much better detail than red light, but many dyes and reagents might be reactive or cytotoxic and technicians are often forced to rely on stable or otherwise reliable products. It is also the case that certain labs have only a few available choices in the matter or have acquired specific cell lines as gifts, which stably express one type of fluorescent protein. In the end, this results in slightly different images of the same subject, and becomes a consi deration when compiling a database of images or cross comparing data from different sources. It is important to remember that a filament imaged with red fluorescence cannot demonstrate the same resolution as the same filament imaged with blue fluorescence, and a lack of fine structures in the former does not mean they were not present, but rather they were unresolvable. Similarly, the intensity of a fluorescence signal usually does not represent the concentration of either the fluorophore or the target they label unless that relationship has been painstakingly calibrated. (Sluder et al., 2007)
39 Temporal R esolution Time has been mentioned in passing to describe photobleaching effects, but fundamentally all experiments must consider to increments of time. The f irst and most obvious is the total observation time. Tracking a living cell, for instance, can be achieved by taking a picture every minute or so for a period of time, say, one hour. The other important factor to consider is the exposure time of each image . After every exposure, the computer takes a small amount of time to store each image, which can be improved by limiting the region of interest to a smaller number of pixels or using a binning feature, which clumps neighboring pixels together treating the ir total number of captured photons as one output quantity. Furthermore, if the user has specified multiple filters or several positions at each time point, additional time will be necessary to switch filters and shift the stage. Some microscope software h elps optimize the order of stage positions to reduce the commute, but it is more likely that switching filters will take more time to complete. All of these factors must be considered within he span of each experimental time point so that one cycle of phot ographs does not run over time. A larger exposure time will result in a brighter image because more photons could be gathered during image acquisition. Binning also increases apparent brightness, but at the cost of spatial resolution. Thus, if the span bet ween time points is not a concern, the best choice for bright, clear images is to use longer exposure times. However, some dynamic behaviors require the rapid observation of the specimen and low exposure time combined with a small region of interest and bi nning can help to achieve a desirable temporal resolution close to real time, or about 33ms between images . Thus, another trade off arises between temporal and spatial resolution. Very
40 rapid image acquisition requires the sacrifice of spatial resolution fr om using features like binning and EM gain to increase the incoming signal of the specimen.
41 Figure 1 1. A screenshot demonstrate s how to load three types of images into MATLAB and how the imread function automatically chooses the variable class.
42 Figu re 1 2. An improvised blank image may be constructed by piecing together the two opposite sides of different frames of a time series to achieve background subtraction.
43 Figure 1 3. An intensity histogram for an image with significant intensity inhomogenei ty (uneven lighting) and some debris in the background as well as a single object of interest. Figure 1 4. An intensity histogram for a blank image, comprised of intensity inhomogeneity and debris .
44 Figure 1 5. An intensity histogram for a background subtracted image. Figure 1 6 . A portion of a fluorescence image is cropped to display pixel noise.
45 Figure 1 7 . An inten sity histogram of pixel noise fit with a Gaussian distribution of the same mean and standard deviation as the noise sample has an R 2 value of 0.9498 . Figure 1 8 . A QQ plot further demonstrates the Gaussian nature of sampled pixel noise.
46 Figure 1 9 . The Airy disk is shown, cut at 2.5% of its maximum intensity to better visualize the outer rings. Figure 1 10. A simulated disk shape (left) convolved with a narrow Airy pattern (middle) and a broad Airy pattern (right).
47 Figure 1 1 1 . A 2 D Gaussian curve is often used to approximate the Airy pattern, the theoretical PSF for diffraction limited microscope systems with circular aperture. Figure 1 12. Photobleaching can quench the fluorescence signal from a sample, making certain cellular features difficult to extract.
48 CHA PTER 2 NEW IMAGE PROCESSING TECHNIQUES Following image acquisition, a number of image processing algorithms and commercial programs ar e available to extract data from digital images. To aid in object detection and segmentation, methods that remove or reduce noise, de convolve images or enhance the edges of objects are commonly employed . The specific choice of method for a given application is often ad hoc , bu t this chapter will demonstrate several of those choices . Dealing with noise is so fundamental that it will lead the discussion, followed by d econvolution , since many d econvolution algorithms require a de noising step beforehand. Edge detection , segmentati on and other object recognition algorithms will then be discussed, ending with a discussion of current image processing packages that are available to perform these techniques. Overall, user interaction during image processing should be limited for many bi omedical applications to reduce human error , and automated processing techniques are emphasized. Processing Noise Since pixel noise is mostly unavoidable, understanding how to characterize and remove it is a major part of image processing. The introduction briefly mentions that the statistical properties of noise are often exploited to remove it, and here, that concept will be explained in more detail. Image C ropping Figure 1 6 exemplifies one of the most basic ways to characterize noise. A user can select a portion of the image that contains no objects of interest and perform basic statistics to characterize the pixel noise. Since t he mean and standard deviation of the pixel intensities in the cropped region correspond to a Gaussian distribution, almost
49 96% of noisy pixels should will fall below two standard deviations more than the mean intensity. This is often used as a basic segmentation method to distinguish objects from noisy backgrounds in digital images: pixels greater than the threshold are labeled a s 1 while those below the threshold are labeled with 0. This simplistic view of noise in digital images is very powerful at times, but can critically fail in the presence of uneven illumination in digital images, either from the background or surrounding o bjects. Figure 2 1 features the result of cropping two distinct regions in a fluorescence image and applying a noise based threshold to produce two extremely different results. This example demonstrates the potential data variability from human involvement in image processing. Even with training and experience, cropping an image leaves the potential for an arbitrary choice that could become a major source of error. Robust S tatistics Automating the process of cropping presents some significant barriers becau se the placement and number of objects will differ from image to image. In some images, the field of view might be too crowded with objects to crop the image and characterize the noise. However, during the experimental setup, the field of view is almost un animously chosen to center around the object of interest. Under that heuristic, it is reasonable to sample the noise using the pixels in the periphery of digital images instead of a cropped region . Using the periphery of images to characterize noise is eas y to automate, but still encounters the issue of signals at the edge of the image that do not correspond to the background. Figure 2 2 shows an intensity histogram of the pixels in the inset image. The arrow indicates the presence of cell signals, which sk ew the anticipated Gaussian
50 these pixels are heavily skewed by the presence of brighter pixel intensities in the sample and lead to the segmentation result in the left pan el of Figure 2 3. This can be overcome by calculating a threshold using more robust statistical parameters: the median and median average deviation (Huber, 1981) . For an ideal normal distribution, the median will be equivalent to the sample mean, and three median average deviations will equal two standard deviations so that a similar threshold may be calculated and applied to images. The right panel of Figure 2 3 features the more preferable segmentation result obtained by applying robust statistics to esti mate the noise in peripheral image pixels. Expectation M aximization Although the speed of simple calculations is an advantage in image processing, more sophisticated methods exist to estimate and manage noise in digital images. The expectation maximization algorithm is one such method, employing a guess of the intensity distribution of an image and iteratively converging on the best estimate of the parameters of that model distribution (Jiang, 1993) . The Gaussian distribution of pixel noise suggests that th e overall distribution of image intensities could be composed of a Gaussian and some other probability distribution that describes the remaining signal. In t his work , a Gaussian mixture model was applied to estimate the noise statistics of images by expect ation maximization. A responsibility function was defined as: ,
51 w here and are Gaussian densities with mean and variance . Initial guesses for were set as the minimum and maximum image intensity, and both were set to the standard deviation of all intensities. Then, and were updated using the responsibility following: , , , , where is a vector of the original image intensities. Thereafter, is calculated as the mean of and compared to until their difference is less than . Once convergence is achieved, adequately represents the Gaussian distributed noise in the image , as seen in Figure 2 4 . The responsibility function of the signal is compared to the estimated densities in Figur e 2 5, showing that values near to 1 more likely correspond to the cell signal. This is supported in Figure 2 6 using a side by side comparison of the raw image and an image of each pixel responsibility. The red curve in Figure 2 6 was determined by thresh olding the responsibility image by 0.95. Application of expectation maximization using a two Gaussian mixture model can be very powerful, but if images have a very low SNR, the noise and signal distributions become difficult to statistically resolve, just like particles smaller than the optical resolution are difficult to spatially resolve. Figure 2 7 demonstrates the significant overlap of the estimated noise and signal densities after expectation maximization. Thresholding the responsibility delivers a co mpletely inaccurate boundary estimate for
52 these low SNR images as shown the top row of Figure 2 8. The bottom row of Figure 2 8 shows how very high SNR images are also difficult to threshold using pixel responsibilities. However, this is due to extra glare surrounding the bright cell fluorescence signal from the OTF , which will be revisited later in this chapter . The low SNR case demonstrates the necessity to not only estimate the statistics of pixel noise, but also diminish or remove the noise using some s ort of de noise process. The next few sections will introduce some common de noising methods, but before that, the two Gaussian mixture model should be revisited as an assumption. From Figure 2 7, it is apparent that the signal density estimate extends to intensities below the mean background level. A better assumption may be an exponentially distributed signal density or reciprocal density based on the shape of intensity histograms. Figure 2 9 shows the histogram of a synthetic 2 D Gaussian intensity profi le (inset image) fit to the reciprocal function (R 2 = 0.9779 ). If the intensity profile of the cell signal is modeled by a reciprocal density function, improvement of the expectation maximization performance may be possible. Unfortunately, the reciprocal f unction is difficult to fit into the expectation maximization framework in a straightforward manner due to its asymptotic tendency. Gaussian B lurring The most basic de noising technique is to average the intensities within a small neighborhood of the image . In a 3x3 neighborhood, nine pixels would get averaged together and if only pixel noise is present, this process would be equivalent to sampling from the underlying noise distribution to get a sample mean nearer to the true mean than the original value. F igure 2 10 demonstrates this averaging process for a pure noise sample.
53 Mathematically, averaging pixel values over a moving window throughout an image can be performed via convolution following: , where is the convolution operator, is the current window within the image, and is a filter with elements, also called the kernel. When the kernel is an array of , it corresponds to the basic averaging filter. In terms of spatial frequency (refer to Chapter 1), averaging filters are low pass filters. The general characteristic of a low pass filter is that they have elements and The flatness of averaging filters allows many bands of low spatial frequency pass to the convolution result, which makes de noised images appear blurry. Gaussian kernels are also low pass filters, but they tend to keep more of the high frequencies in the de noise result, which corresponds to sharper images with greater contrast. Figure 2 11 demonstrates filtering with 3x3 and 7x7 averaging filters versus a 7x7 Gaussian filter. The Gaussian filter results in a less noisy image that is also less blurry than simple averaging (similar to the 3x3 averaging filter result for the example ). Despite the improvement over av eraging, use of a Gaussian kernel may still cause loss in edge resolution at the boundaries of objects because edges are high spatial frequency content just like pixel noise . This attribute has driven the development of more sophisticate de noising techniq ues such as wavelet de noising and anisotropic diffusion, both discussed shortly. Wavelet D e noising The wavelet based approach to image processing was developed to decompose images in a similar manner as Fourier decomposition, but in a way that retains sp atial correspondence to the frequency content (Daubechies, 1992) . Fourier decomposition
54 uses trigonometric functions that have infinite support (i.e. they have non zero values as they approach infinity) , and can detect the presence of certain spatial fre quencies in images, but provides no information about the location of that content. As an alternative, wavelets are short functions with adjustable support (they equal zero after a certain distance). Instead of applying a single filter to images, the wave let approach to frequency analysis applies a succession of filters, all derived from a certain wavelet family. Figure 2 12 demonstrates the frequency analysis of an arbitrary sinusoid using the wavelet approach. Common wavelet families include the Haar, Da ubechies, Coiflet and Symlet wavelets. Haar wavelets are essentially step functions that step from negative to positive instead of zero to one , and are scaled in such a way that they can form an orthonormal basis for the signal. For instance , t he first Haa r wavelet is simply an averaging filter with elements, (the Frobenius norm) . Thus, it is a low pass filter, and those that remain in the family can all be considered as high pass filters. The dot product of these low and high pass filters will be zero. Other wavelet families are also scaled so that their Frobenius norm is one, organized into a n orthonormal basis in a similar manner, but have more complex band pass properties. Figure 2 13 is a screenshot of the elem ents of the Haar and Daubechies2 wavelet family generated by The major application of wavelet decomposition is to compress and store images. The level of decomposition determines how compressed the int ensity information becom es, but (and more importantly to de noising) after one level of decomposition, im ages get segregated by their spatial frequencies. The compressed
55 image will contain the best estimate of low spatial frequencies and appears as a down sampled version of the o riginal image, but there are also three high pass images after 1 st level decomposition that correspond to vertical, diagonal and horizontal filtering of pixels. Figure 2 14 demonstrates a single level of wavelet decomposition using the Haar wavelet family. Ideally, the pixel noise resides within the high pass images after one level of decomposition. A great advantage of wavelets is not that they can simply decompose images, but that the decomposed images can be reconstructed using the same family of wavelet s without significant loss of information . The wavelets used for reconstruction are simply those used for decomposition with elements in reverse order; hence, the set of all filters are known as quadrature mirror filters. Figure 2 15 shows the differences between each pixel for an original image and a reconstructed image, called the residual. The residuals are on the order of , which is a negligible difference that disappear s when converting the reconstructed image from double precision back into an un signed 8 bit integer. The key to wavelet based de noising lies in deconstructing images, and reconstructing them with eliminated (set to zero) high pass information. Unfortunately, some of the deconstructed high pass information corresponds to the edges o f cells, and reconstructing images with all high pass values at zero is equivalent to performing a low pass filter using the first decomposition wavelet, leading to blurred object edges. Instead, a threshold of the high pass values can be set based on the intrinsic pixel noise of the image. Figure 2 16 features a histogram of a typical fluorescent image that has been characterized by the expectation maximization algorithm to extract the pixel noise
56 statistics. In Figure 2 17, the histogram of horizontal hig h pass values for the decomposed image is displayed, and it features two lines that indicate three standard deviations of the original pixel noise. P ixel noise only varies within a certain range ( 3 standard deviations) , so that high pass information outside of that range more likely correspond to cell signals. Figure 2 18 features the result of thresholding a single level high pass result by the pixel noise standard deviations, supporting that the cell signal is responsib le for larger high pass values through a comparison with the low pass decomposition result. Setting high pass values to zero before reconstruction results in a final processed image with less high frequency information than the original, which appears blur ry similar to Gaussian blurring in regions that were de noised (see Figure 2 19). Anisotropic D iffusion Direct modification of high pass information in images had earlier implementations in an algorithm known as anisotropic diffusion (AD). Instead of a complex high pass filter, AD uses a simple gradient operator using the nearest neighbor difference in pixel values applied to the heat diffusion equation: , where image intensity is diffus ed instead of heat, is a constant and is the Laplacian operator. The discrete formulation of the AD algorithm is as follows: ,
57 where is a fu nc tion of the simple gradient, is the gradient operator, and the sum is performed over the four nearest neighboring pixels at each position in the image. The anisotropic aspect of AD refers to selective diffusion that smooths intensities between pixels i f the gradient is low, and leaves pixels with high gradients unchanged. This is done in an effort to preserve object edges in the image similarly to how the high pass information in wavelet de noising selectively gets set to 0 (see previous section) . Thus, the choice of , also called an influence function, can significantly affect the final result of how the AD algorithm de noises an image. In their initial report, Perona and Malik proposed two functions: , and , which correspond to a Lorenztian error norm and the Leclercrobust error norm, respectively. Another influence function was later proposed by Black: these functions, is comparable to a threshold of the gradient and may be calculated by estimating the standard deviation ( ) of the pixel noise in the original image and setting . The literature mentions setting hich involves determining the
58 threshold that encompasses 90% of the gradient density within a histogram at each iteration of AD. In essence, the edge preserving motivation of AD is similar to keeping some high pass information in wavelet decomposition ima ges before reconstruction, except that AD does not remove low pass information before calculating gradients. This motivated the development of a new de noise method, which takes advantage of low pass removal before attempting to preserve edge information . This is discussed in detail in the next section and comparison against other de noise methods is performed. Dinosort A new de noising technique is being introduced in this section. It builds on some of the noise characterization and remove techniques previ ously discussed in the above sections, and performs well for Gaussian distributed pixel noise common in many types of digital images. The procedure will be discussed in detail, then benchmarked against other de noising methods to support its efficacy and e fficiency. Beginning from the concept of Gaussian blur, dinosort begins applies a low pass filter to an input image. The Gaussian kernel tends to pass too much of the high spatial frequencies in images, so a n averaging kernel was chosen to perform the low pass: . Figure 2 20 demonstrates low pass filtering by a 3x3 averaging kernel. Since low pass filtering removes high spatial frequency content from the original image, the residual between the raw image and low pass result should reflect the omit ted high frequency information:
59 T his residual is used as an estimate of the pixel noise contained in the raw image, but visual inspection usually indicates some of the signal from the edges of objects is also present , which is easily noted in the residual featured in Figure 2 21. For that reason, the pixels in the periphery of the residual are used to estimate the statistics of the noise, then another synthetic noisy image is created us ing the calculated variance and a zero mean, with pixel intensities normally distributed as . The next stage of the method involves sorting the pixels values of both the residual and the synthetic noi sy image from lowes t to highest intensity . Once the intensity values are sorted in this manner, very low and very high intensities from the residual image will be further from zero than the synthetic noise because they are not restricted by the statistics of a normal distrib ution. However, middle valued sorted pixels from both images correspond to normally distributed no ise, and their difference in intensity will be very small . Figure 2 22 plots the sorted intensities of the residual and synthetic noise against each other, sh owing how similar their aligned values are. The mean square error of the sorted difference scales as when the algorithm is fed pure Gaussian noise as input (R 2 = 0.9 786 , see Figure 2 23 ) . Once the sorted synthetic noise is subtra cted from the sorted residual, each pixel is repositioned to its original location in the raw image. The last step to dinosort is to add the rebuilt sorting result to the low pass result from the first step . This produces a de noised image with approximately 34 % of the original noise deviation % when the algorithm is fed pure Gaussian noise as input , as shown in Figure 2 24 . However, for images containing objects, simply replacing the rebuilt sorting re sult with the original high pass information from the image attenuates high frequencies too much, so that the algorithm essentially
60 acts as a common averaging filter (data not shown). To resolve this issue, a multiplicative scaling of the rebuilt sorting r esult is performed following: , and the final de noise d image is reconstructed by adding the initial low pass result and the scaled rebuilt sorting result . Figure 2 2 5 compares the residual to both the u nscaled and scaled rebuilt sorting result, showing how high pass information near the object is recovered from scaling while noise is significantly reduced. Dinosort outperforms other de noising methods in speed and noise reduction. To support this claim , a synthetic noisy image was employed to measure how fast an iteration of dinosort would take versus anisotropic diffusion for Lorentzian, Leclerc and TÃ¼key influence functions, Coiflet3 wavelet de noising and Gaussian blurring with a 3x3 kernel. Figure 2 2 6 shows the test image next to a poor segmentation result based on expectation maximization. From Figure 2 27, it is apparent that the two estimated Gaussian densities are too near in intensity to each other to gain resolution using the responsibility func tion. After speed testing , 25 iterations of each algorithm were performed to de noise the noisy test image, and the standard deviation of each result was recorded. Figure 2 2 8 features the x axis and nois e reduction as the y axis and each point corresponding to a single iteration. After one iteration of dinosort, it drastically outperforms other de noise algorithms by several iterations. The only competitor is Gaussian blurring, which has extremely low com putational complexity and runs much faster so that it performs similarly after three iterations in under the amount of time as one dinosort iteration. However, after several application of the Gaussian kernel, low pass frequencies
61 become enhanced more than dinosort, which conserves high spatial frequencies of the object. After only two iterations, dinosort delivers good de noise performance in less time, with better retention of object high frequencies, as seen in Figure 2 29 and Figure 2 30. Low Spatial F requencies Objects in images appear to have solidity due to regions of consistent or gradually changing pixel to pixel intensity values. Th ese image features are considered to be low spatial frequency information and can also include blurriness from out of focus objects and diffracted light due to the influence of the PSF . Figure 2 31 demonstrates how t he presence of low spatial frequencies that distort object segmentation cannot be address ed by further de noising , and may even be amplified by de noising techniques . This section describes several deconvolution techniques, which originated to remove the influence of the PSF in measured signals, but is applicable to removal of many blurry type s o f feature s within images, characterized by low spatial frequency. Based on the poor performance of existing techniques for images of cell fluorescence, a new technique to reduce the influence of low spatial frequency sources of error is introduced. Jansson Van Cittert Deconvolution One of the earlier efforts to remove bias introduced in signals due to a transfer function was described in 1970 by Jansson in the effort to enhance the resolution of spectra. The transfer function in that work was the instrument impulse response function, which acts similarly to the PSF to modify a measured signal via convolution, although the physical mechanism may be different to an optical system in a microscope. The attempt to reverse the convolution operation marked one of t he first iterative deconvolution techniques.
62 The algorithm originated from the concept that some unknown, true image gets convolved by a transfer function to become the obtained digital image. In terms of spatial frequency, the JVC algorithm assumes the tr ansfer function is a low pass filter, then attempts to recreate the lost high spatial frequencies of an ideal, true image. The recreated high spatial frequency information is generated using a high boost method, in which an image is subjected to a low pass filter, and the result is subtracted from the same image. By subtracting the low frequencies, high frequencies will remain. The dinosort algorithm described in the previous section relies on the high boost principle to obtain the residual before the sorti ng step. In the case of JVC, they add the high boost result back together with a different low pass result to get the next approximation of a true image as follows: and then, where the superscript denotes the current iteration step, is the 2 D Gaussian approximation of the PSF, and the subscripts and represent the high boost and low pass result on the image . The function is a relaxation parameter that ensures that the hig h boost result does not exceed certain constraints. For instance, pixel values should not take on negative values or exceed the bit depth of the input image. Originally , a 25 point Savitzky Golay cubic polynomial kernel was applied to smooth data. In this work , a 5 point Savitzky Golay cubic polynomial kernel ([ 3 12 17 12 3]/35) was used to better represent the raw images:
63 and, where indicates the transpose. This process helps to ensure that noise d oes not explode in value after a few iterations, but the JVC algorithm still manages to amplify regions of the image that do not correspond to the object as shown in Figure 2 32. Built in MATLAB Deconvolution M ethods Building from the classical concept that images are the result of a convolution of a true image by some PSF, methods began to take into account the possibility of additive noise in raw images. Without derailing into the independent topic of deconvolution, this section will briefly introduce four built in MATLAB deconvolution methods. It is noteworthy that the major developments for deconvolution emerged in the field of astronomy rather than to microscopy. This is relevant in the sense that light from stars might have much less diffraction in the total vacuum of space than a fluorescent signal might have when passing through various transparent cellular components such as unlabeled structural proteins, small subcellular compartments and the cell membrane, and an approximation of the PSF for a microscopy system based on calibration cannot take the case by case variation of cell physiology into account . The Wie ner filter makes very basic assumptions about a raw digital image and finds the optimal solution to dec onvolve an image in the sense of least squares. I t projects the Tikhonov Miller solution to the least squares problem into Fourier space : where is an approximation of the PSF and is a noise to signal ratio, usually set between and , depending on the image. This can be performed using the
64 deconvwnr function in MATLAB , and a result is shown in Figure 2 33 . This formulati on does not take into account any a priori knowledge of the solution and can deliver negative pixel value s, which lead to the development of regularized decon volution, in which n oise is modeled as a Gaussian distribution and negative values are actively avoided. The deconvreg function in MATLAB can perform deconvolution using regularized functions ; Figure 2 3 4 shows a result . The Lucy Richardson algorithm emerged later on to handle images with Poisson distributed noise, finding the maximum likelihood solution rather than a optimized least squares result. An example of this method is shown in Figure 2 35, and i t can be implemented in MATLAB using the deconvlucy function, but, just like the other three deconvolution methods mentioned in this section and the JVC algorithm, an approximation of the PSF is required to find a result. Bad estimates of the PSF can drast ically change deconvolution results. In response to this issue, so called blind deconvolution evolved to return a max likelihood estimate based on very limited initial knowledge of the PSF. The deconvblind function in MATLAB only requires a PSF argument th at approximates the size of the kernel, and returns a PSF estimate alon g with the deconvolution result, both shown in Figure 2 36. Automated Spatial F iltering This work introduces a method to remove bias from low spatial frequency information in fluorescen t images that could otherwise prohibit the accurate detection of cell boundaries (Arce, 2013) . It uses heuristic information extracted from raw images to set its parameters, avoiding user input and making it amenable to automation. Thus, it will be referred to as automated spatial filtering, or ASF. The concept behind ASF emerged independently of existing deconvolution algorithms, but shares the classical
65 principle that a raw image is the result of some true image that has been convo lved by some PSF with low pass qualities. It differs in that it applies a rescaling step rather than iteration to remove low spatial frequencies. This section will introduce and compare ASF results to existing deconvolution schemes . Some initial parameters are necessary to run ASF on a raw image, but they are easily extracted from the images themselves using techniques discussed in the noise section from earlier in this chapter. An estimate of the pixel noise can be calculated either through robust statisti cs on periphery pixels or expectation maximization. In practice, both of these methods deliver very similar results in most cases, but special care must be taken when image frames contain many cells to ensure that the noise is adequately represented by the calculation. A simple intensity threshold is applied using the noise statistics to determine the number of pixels that correspond to the area of cells. This value will be skewed by the low spatial frequency information, or rather, the glare from brighter regions of the cell fluorescence signal onto the nearby background. This is desirable because it makes segmented cell signals appear more rounded. The next preliminary step to ASF calculates the radius of a circle whose area is equal to the approximate cel l area. Figure 2 37 shows a representative circle, overlaid onto a rough segmentation result. Twice this radius value will be used as the kernel size to construct a low pass filter and perform ASF. This series of preliminary steps work very well for image s of single cells, and can be altered when two or more cells are present in a frame to individually calculate a filter size per cell. Issues arise when many cells are touching and appear is a single object after initial segmentation, to which there are a f ew options: the average cell area may be
66 used to obtain a filter size if it is known, the largest solitary segmented object area can be used, a value proportional to the size of the nucleus area may be used, or more involved algorithms may be applied in an effort to individually identify connected cells, such as the watershed algorithm. Since this work primarily focused on low density single cells, these extra measures were often unnecessary, but this issue will be revisited in Chapter 4. After preprocessin g a raw image to obtain the ASF low pass filter size, the algorithm is very simple. Two choices for a low pass filter were investigated: a 2 D Gaussian filter, and a wedding cake shaped filter, which is composed from a stack of three circular disks that al so follows the profile of a 2 D Gaussian curve. For either choice, the standard deviation used to create the filter shape was set as the filter size and the peak of the profile was centered. The wedding cake was an early attempt to represent the lower p ortion of an Airy disk, and was abandoned in favor of the 2 D Gaussian filter due to similar performance and simpler calculation. Both filters are shown in Figure 2 38, next to the lower portion of an Airy disk, each of size 31 x 31 . The raw image is convo lved by the low pass filter , and then , similar ly to the high boost step of the JVC algorithm, the residual between the low pass result and raw image is obtained as follows: and, where k is the designed low pass filter. The low pass and high boost results are shown in Figure 2 39.
67 The key step to ASF is to search through the high boost result and find the index of the pixel corresponding to the most negative value: where the psuedoco de indicates the extraction of the pixel index at the most negative value of . A re scaling parameter is calculated using the ratio of raw image and low pass image intensity values above the average background intensity: where represents the rescaling parameter, and is the average background intensity. The final step to ASF is a modified high boost calculation, which takes advantage of the rescaling parameter to augment the low pass information: The final result of ASF can be seen in Figure 2 40, along with the result of segmentation for both a raw image and a de noised image as input . The reasoning behind the rescaling step of ASF is tha t object edges in images are smoothed by the low pass filter just as a step function would appear as a gradual slope after convolution with a low pass kernel. When compared to the original curve, a low pass result at an edge will be greater than the origin al intensities in the region outside of the object. Thus, subtraction of the low pass result from the original image will contain negative values near the edges of an object. It was observed that more severe edges, like those of the cell signal near the re gion of the cell body, would generate more negative intensity values after the subtraction. These regions also correspond to the presence of higher glare, generated by the PSF or other out of focus portions of the nearby cell signal. Therefore, rescaling b y effectively sets the pixel at the
68 location of the glare to be equal to the background intensity level, so that subsequent segmentation removes that pixel from the object boundary. Due to the adaptive nature of spatial filtering, the resc aling step also proportionally adjusts other pixels surrounding the cell signal so that they fall below a simple noise ba sed intensity threshold as well, but in a manner that does not mistakenly omit very dim cell signals such as those from cell protrusions. Figure 2 41 demonstrates the effect on background pixel intensities using a before and after comparison of pixel intensity histograms. Since ASF only manipulates low pass information, high spatial frequencies from noise and important cell edges are not modified and do not affect the ASF result, unlike classical deconvolution schemes that can diverge greatly in the presence of noise. It has the additional advantage of simple calculation rather than computationally expensive iterations and ex tra matrix operations. In combination with a de noising technique, ASF can deliver very clear images that are simple to segment using a noise based intensity threshold because the pixels along the boundary of the cell signal fall closer to a single intensi ty than before processing , shown in Figure 2 4 2 . This characteristic can be used to assess the effectiveness of ASF at restoring a raw image closer to a true image for the purposes of boundary detection. Detecting O bjects Processing images to reduce noise or glare is the first step toward detecting objects. Segmentation has already been used in several examples in this chapter to identify pixels corresponding to the cell signal in digital images, but there are other options available. This section discusses additional options for segmentation, edge detection, graph cuts, level set algorithms, and reviews the strengths and weaknesses of some openly available image processing software packages in comparison to the
69 new image processing methods introduced in thi s work . The section on level sets introduces for the first time a new scheme to segment other types of digital images than fluorescence images through a fusion of the level set formulation and ASF. Segmentation One of the most widely applied segmentation s chemes in biomedical applications the one dimensional quality of an intensity histogram to rapidly calculate a threshold level that maximizes the variance of the two group s of pixels above and below that threshold. for 16 bit images. First, a histogram is created, counting the number of pixels in each bin from 1 to 2 16 1 au. A probability dist ribution is created by dividing the histogram result by the number of pixels in the image. The cumulative distribution is calculated along with the expected value of the low intensity group for each possible threshold choice. These vectors are used to calc ulate a variance vector , and whichever element corresponds to the maximum variance indicates the Otsu intensity threshold. Figure 2 43 depicts the Otsu threshold given an image intensity histogram. T he Ridler Culvard method is another histogram based thres hold method, which uses a k means approach to obtain a threshold, as seen in Figure 2 44. G enerally , these types of thresholding strategies ignore valuable spatial heuristics in images by using histograms. They can generate inaccurate segmentation in the presence of noise, low spatial frequencies or additional objects in an image. A Gaussian filter it sometimes applied to de noise advised since it could blur object edges and degrade spatial resolution.
70 Edge D ete ction Applying a threshold is not the only way to detect objects in digital images. Some of the first efforts in computer vision relied on high pass filters to enhance the edges of objects in images (Hoggar, 2006 ; Gonzalez, 2007 ) . A very simple gradient operator that returns the difference between neighboring pixels in an image after convolution takes the form: These are analogous to the Haar wa velets which perform the high pass convol utions in wavelet decomposition, except that their Frobenius norm is The Prewitt and Sobel operators are larger in size, but perform the same high pass functions; they are represented as: and, More complex edge detectors emerged, such as the Laplacian of a Gaussian (LoG) operator, which is almost exactly what it says in its name: the negative second derivative of a 2 D Gaussian kernel. The LoG operator is used in the Marr Hildreth algorithm to detect objects, which applies the filter then checks for zero crossings, but
71 the method tends to generate false edges. However, t hese initial edge detectors fall far short of the performance of the Canny edge dete ction algorithm. It first filters images by a 2 D Gaussian filter, then by a first derivative Gaussian filter, and finally applies a maximum suppression thresholding process to obtain object edges. All of these edge detectors tend to generate a pie cewise b oundary for cell images, as shown in Figure 2 45. A lthough boundary reconstruction algorithms exist, there are more sophisticated algorithms that have emerged in recent years to take their place. Graph C uts In the graph cut approach, pixel intensities are used to construct a similarity matrix, or graph, which considers the distance of two pixels from one another as well as their relative values. The eigenvalues and eigenvectors of this graph, or a normalized version of the graph, are calculated and used to cluster the pixels in the image by their similarity . Th ese step s are computationally expensive and u sually implemented in lower level co mputer languages to increase speed. Even then, images must be down sampled to a smaller size to prevent running out of memory, which defeats the purpose of accurately detecting the object in the first place. Graph cuts as segmentation tools are included here for completeness, but they will b e revisited in Chapter 3 for use in clustering data ( Cour, 2005 ) . Level S ets With the knowledge from early edge detection methods , the concept to guide a deformable curve to the edges of objects was proposed. By directing a closed curve within the space of an image, piecewise boundaries are avoided (Samadani, 1989) . Then, instead of using a simple closed curve, level set functions were proposed, which deform a surface instead of a curve (Malladi, 1995; Li, 201 1) . Level sets, in terms of
72 image segmentation, are surfaces with positive and negative valued elements that partition the space of the image by their intersection with a plane at zero (Shi et al., 2000; Ray, 2002; Chen et al., 2007) . The shape of a level set can be anything continuous initially, and then an energy minimization is performed. One of the first energy functions was the Mumford Shah function: where was simply a smooth function in regions separated by the cu rve , with length . Minimizing this energy forced to be near the original image and also maintain its smoothness in segmented regions. Chan and Vese soon adapted this energy for level sets as: which imposes a Heaviside function on the level set to partition regions in the image and replaces with constants. Then, Li introduced a blend of the two concepts by returning to the energy functional in the form of a bias field , but keepin g the constants . Furthermore, the calculation was imposed in a local region of the image using an influence function , similar to those used in anisotropic diffusion. The final where the last term indicates a potential of the form . hree reasons. First, it minimizes energy in local regions of the image by the action of , which was chosen to be a truncated Gaussian kernel. Second, it generates an estimate
73 of the bias field, which introduces low spatial frequencies to the raw image similarly to the low spatial frequencies extracted by the ASF algorithm previously described. Third , boundary generated by the zero level set, causing poor segmentation approximations to be disfavored. Ideally, the zero level set will converge to the boundary of the objects in an image, but convergence still depends heavily on how the initial level set was specified and the user contr olled variables and . The result s of this level set formulation using , a nd either an initial level set determined by basic thresholding or a box, are shown in Figure 2 46. Level S ets and ASF The performance of level set segmentation against ASF is poor, but due to the similarities between the bias field and the low spa tial fre quencies removed by ASF, the two methods were integrated to fuse their unique abilities. The bias field was initialized with the removed ASF low spatial frequencies and run for a single iteration to produce the segmentation featured in Figure 2 47. That re sult was better than standard level set segmentation, but still less optimal than ASF. However, ASF was developed for fluorescent images, which feature cell signals that are always above a background. ASF has poor performance for other types of images, suc h as the CT scan of a heart shown in Figure 2 48. In contrast, the fusion of level sets and ASF outperforms standard level set segmentation of the CT scan image in a fraction of the time , also shown in Figure 2 48 . This represents a major application for A SF other than for Fluorescence images, and is the first time this type of augmentation of the level set approach has been implemented.
74 Detecting the C ytoskeleton This section will suggest a unique use for the ASF algorithm that enables detection of cytoskeletal protein fluorescent signals . Normally in ASF, a low pass image is scaled automatically to produce a final result; however, the introduction of a new factor can generate interesting image processing results for fluorescently labeled cyto skeletal elements such as microtubules or stress fibers: Calculation of the segmentation as is increased from 0.5 to 2 3 will produce drastically different approximations of the fiber boundary. Adding each segm entation result together gives a stack of pixels that immensely clarifies the original image. This stack can be segmented again, and then skeletonized to obtain the coordinates of the fibers in the image. Figure 2 49 demonstrates the application of this me thod for an image of fluorescently labeled microtubules , graciously provided by Dr. Jun Wu. T he potential to classify fiber shape and changes in curvature is feasible, but a method to reliably extract the coordinates of single fibers in images has not yet been developed. The modification of the ASF algorithm proposed in this section can address this need in terms of initial object detection, but more work is necessary to uncouple the coordinates of fibers generated in this manner. Alternatively, this method could significantly support co localization studies between cytoskeletal components and other fluorescently labeled proteins in live cells. Figure 2 50 demonstrates this concept for the example of the actin cytoskeleton and ballistically injected fluoresc ent beads. Summary Computational cost and the user defined parameters have prevented the broader exposure of more complex segmentation schemes to the biological community, and
75 widely available software packages such as ImageJ or CellProfiler tend to featur e the simpler segmentation schemes which use a single thre shold (Carpenter et al., 2006; Collins, 2007) . Basic segmentation techniques may not be as accurate, but these software packages also come with other macros and plug ins to help aid subsequent measu rement and data acquisition that users may prefer. This helps to streamline data processing and promotes the capability for high throughput, which is becoming a major consideration of cell biology to help handle the massive amounts of data n ecessary for st atistical power. A major consideration in this work is the distinction between tracking particles and nuclei versus tracking cell behavior. The symmetry and regular intensity profiles of fluorescent beads or stained nuclei respond very well to classical ob ject detection, and often, the only data extracted from such images are the centroids of the objects. Trajectories of the object can be constructed from the centroids in each frame of a tracking experiment, and then only averaged quantities like the mean s quared displacement or overall displacement/persiste nce of the object are analyzed. The relationship between the cell and nucleus is so intimate that the nucleus trajectory is often substituted for the cell trajectory to describe cell migration. In the sen se that the nucleus will always be localized within the boundary of the cell, this assumption has some weight, but cells are not rigid objects like nuclei or beads, for that matter. Cell migration is characterized by a multitude of morphologies and poses, determined by the dynamic activity of protrusions like lamellipodia or filopodia. Thus, it is a massive disservice to cells to describe their migration by their centroids alone, approximated by their nucleus position or otherwise.
76 However, before approachi ng how to comprehensively quantify cell migration, cell signals within images must be extracted with enough precision to overcome the heterogeneity of live cells. This chapter has provided the foundation to perform such measurements. Two new methods have b een introduced that outperform existing image processing methods of the same class. Dinosort is a rapid, edge preserving, de noising algorithm and the ASF algorithm removes bias from glare and other low spatial frequency information to quickly deliver high ly accurate cell boundaries. Together, these methods extract the best possible cell boundaries from fluorescent images in an automated format, which allows for greater repeatability and data sharing since user interaction does not introduce variation to im age processing outcom es and subsequent measurements. The downside to developing dinosort and ASF in MATLAB is limited portability to other platforms . Some effort has been initiated to translate these algorithms to lower level languages like C++ or C, but a more beneficial translation might be to develop a plugin for ImageJ, which will be more widely available to researchers than an independent software. ImageJ uses the Java language, which might also open the door to a portable app for android devices, whic h are becoming more commonplace in the clinical environment and have evolved larger screens and better processing power in recent years.
77 Figure 2 1. Manual cropping of images to assess noise may yield drastically different segmentation results. Figure 2 2 . The histogram of pixel intensities at the periphery of an image reveals signals from fluorescent cells that could skew estimates of noise statistics.
78 Figure 2 3 . The mean and standard deviation of periphery pixels are skewed by cell signals at the edges of the image, but the median and median average deviation are more robust to outlying signals and deliver a better threshold value for segmentation. Figure 2 4 . A histogram of image intensities is compared to the estimated noise distributi on (red line) derived from the expectation maximization algorithm.
79 Figure 2 5. The normalized noise distribution (red) and signal distribution (blue) estimations derived from the expectation maximization algorithm are plotted against the responsibility o f the signal (black). Figure 2 6. Pixels corresponding to the cell fluorescence signal in the raw image (left) will have signal responsibility near one (shown in white). The red boundary curve corresponds to responsibilities greater than 0.95.
80 Figure 2 7. The normalized noise distribution (red) and signal distribution (blue) estimations derived from the expectation maximization algorithm are plotted against the responsibility of the signal (black) for a low SNR image.
81 Figure 2 8. Low SNR imag es have less definitive responsibilities (top), while very high SNR images have responsibilities that are misled by error from the OTF (bottom). The red boundary curve s correspond to responsibilities greater than 0.95.
82 Figure 2 9. An intensity histogram of a 2 D Gaussian intensity profile is fit to the reciprocal function with R 2 = 0.9779 . Figure 2 10 . An arbitrary sample of Gaussian noise is averaged in a 3x3 neighborhood to de noise the original sample (1) and produ ce pixel values closer to the mean intensity (2).
83 Figure 2 11. A noisy raw image (top left) is convolved using a 3x3 averaging kernel (top right), a 7x7 averaging kernel (bottom left) and a 7x7 Gaussian kernel (bottom right).
84 Figure 2 12. Wavelet analysis (top) of a sinusoid ( ) reveals the period nature of the signal at shorter scales (higher frequency) that decays at longer scales (lower frequency) and breaks down at the pinch.
85 Figure 2 13. The elements of the Haar and Daubechies2 w avelet family are organized by decomposition (top rows) and reconstruction (bottom rows) wavelets, and divided between low pass (left) and high pass (right) filter characteristics.
86 Figure 2 14. A color photo graph (topmost) is subjected to a single level of Haar wavelet decomposition to produce a down sampled approximation (middle left) and three high pass images corresponding to horizontal ( middle right), vertical (bottom left) and diagonal (bottom right) convolutions.
87 F igure 2 15. The total residual between the original and Haar wavelet deconstructed then reconstructed images is negligible. Figure 2 16. An intensity histogram of a fluorescence image (inset) is subjected to the expectation maximization algorithm to ext ract an estimated pixel noise density (red line).
88 Figure 2 1 7 . The intensity histogram of a first level horizontal high pass Coif3 wavelet decomposition reveals significant signals outside of the natural range of pixel noise variation of 3 standard dev iations (red lines). Figure 2 1 8 . The black pixels (left) represent the absolute signal greater than 3 standard deviations of pixel noise within the first level horizontal high pass coif3 wavelet decomposition. When placed beside the accompanying low p ass result (right), these pixels seem to correspond to regions of the cell fluorescence signal.
89 Figure 2 1 9 . A typical noisy digital image of cell fluorescence (left) is compared to the result of coiflet3 wavelet de noising, using 3 standard deviations as a high pass threshold before image reconstruction. Figure 2 20. A typical digital image of cell fluorescence (left) is convolved by an averaging filter (right).
90 Figure 2 21. The residual between a raw image and a low pass result contains the remaining high frequency content of the image, including object edges and pixel noise. Figure 2 22. The sorted pixel intensities of a residual between a raw image and a low pass result (black) are plotted against the sorted pixel intensities of a synth etic noise image constructed based on the residual noise statistics (red).
91 Figure 2 2 3 . The mean squared error between pixel values of a sorted residual image for pure noise and a sorted synthetic noise image scales with the variance of the residual noi se.
92 Figure 2 2 4 . For pure noise, the standard deviation in dinosort output versus the raw image standard deviation consistently demonstrates a noise reduction of approximately 34% based upon a histogram of the noise reduction fraction. Figure 2 25. Comparison between an original residual image (left), an unscaled rebuilt sorting result (middle) and a scaled rebuilt sorting result (right) demonstrates the significant reduction of noise along with retention of object related high pass content.
93 Figure 2 2 6 . A test image was created using a binary disk shape with a 25 pixel radius and additive Gaussian noise with a zero mean and unit variance (left). Segmenting the expectation maximization responsibilities by 0.5 demonstrates the interference of heavy noise (right) in identifying the object . Figure 2 2 7 . Expectation maximization using the Gaussian mixture model does not have the statistical resolution to distinguish pixel intensities from the background and the object in a test image.
94 Figure 2 28. The noise reduction fraction versus computation time for Gaussian blurring with a 3x3 kernel (dots), dinosort (pentagrams), and anisotropic diffusion using Lorentzian (+), Leclerc (x) and T Ã¼ key (triangles) influence functions is shown for 25 i terations. The red and blue lines correspond to the performance of dinosort after 1 and 2 iterations, respectively.
95 Figure 2 29. The result of several de noise algorithms after the indicated number of iterations on the test image were selected based o n similarity in noise reduction fraction.
96 Figure 2 30. Segmenting the de noise result of several algorithms demonstrates the retention of high spatial frequencies for dinosort after two iterations based upon smaller holes in the space of the object. Figure 2 31 . Digital images may contain low spatial frequency sources of error that appears as glare locally near region of brighter cell signal and prohibits accurate segmentation despite de noising. The dinosort algorithm was applied to an image of cell fluorescence before thresholding by expectation maximization (red curve).
97 Figure 2 32 . The Jansson Van Cittert algorithm attempts to iteratively remove low spatial frequencies, but may generate false boundaries after segmentation. Expectation maximizat ion was used to threshold the convolution result (red curve). Figure 2 33. Wiener filtering attempts to find the optimal least squares solution to the classical imaging equation. Expectation maximization was used to threshold the convolution result (red curve).
98 Figure 2 34. Regularized functions developed to deconvolve images and avoid negative values in the result. Expectation maximization was used to threshold the convolution result (red curve). Figure 2 35. The Lucy Richardson algorithm models n oise as a Poisson distribution and finds the maximum likelihood solution to the classical imaging equation. Expectation maximization was used to threshold the convolution result (red curve).
99 Figure 2 36. Blind deconvolution finds the maximum likelihood solution to the classical imaging equation without the need to input a specific guess for the PSF. The inset image is the approximated PSF determined by blind convolution. Expectation maximization was used to thr eshold the convolution result (red curve). Figure 2 37. The preprocessing step of automated spatial filtering calculates a low pass filter size by finding the radius of a circle of equal area to a rough segmentation result.
100 Figure 2 38. Automated spa tial filtering employs a low pass kernel to convolve raw images. A 2 D Gaussian (left) is used in practice, but the wedding cake filter (middle) also generated good results, possibly due to similarities with the lower 5% of an Airy disk (right). Figure 2 39. Convolution by a low pass kernel generates a blurry image (left), and subtraction from the raw image generates a high boost result (right). Without the scaling step of ASF, the high boost image results in poor segmentation, generating false ed ges.
101 Figure 2 40. Automated spatial filtering without de noising (left) and with the dinosort algorithm applied beforehand (right) generates very accurate cell boundaries after segmentation by a simple threshold based upon the robust statistics of raw i mage boundary pixels.
102 Figure 2 41. P ixel intensity histogram s before (top) and after (bottom) ASF processing show the same intensity threshold (red vertical line). Before processing, many of the background pixels have higher values due to low spatial fr equency error and disrupt segmentation. In contrast, ASF removes low spatial frequencies and reverts background pixels to lower values (black arrow), fostering accurate segmentation.
103 Figure 2 4 2 . The intensity of the raw image (black) along the discover ed boundary after ASF demonstrates large variation in pixel intensity due to low spatial frequency information in the image. The intensity of the ASF result (red) along the same boundary demonstrates a relatively flat profile, which enhances the effectiven ess of simple thresholds. Figure 2 the variance of pixel values above and below the threshold. An intensity histogram ing segmentation inset.
104 Figure 2 44. An intensity histogram demonstrates the result of the Ridler Culvard thresholding method, with the corresponding segmentation inset. Figure 2 45. Laplacian of a Gaussian (Log) and Canny filters generate piecewise boundary results. The Marr Hildreth algorithm finds the zero crossings of a LoG filter to detect objects, but generates false edges.
105 Figure 2 46. The level set algorithm was initialized either by a thresholding (top left) or drawing a box (bottom left). The middle and right columns display the result of 150 iterations for each initialization as well as the final estimate of the bias field in the raw image. Figure 2 47. A fusion of leve l sets and the ASF algorithm significantly enhances the ability to detect cell boundaries, but does not outperform ASF alone.
106 Figure 2 48. A heart CT scan was initialized for level set segmentation using a box (top left) and converges on the heart chamb ers after roughly 7 seconds and 150 iterations (top right). The ASF algorithm performs poorly on CT images (bottom left), but a fusion of level sets and the ASF algorithm significantly enhances the ability to detect the heart chambers in almost a tenth of a second, after a single iteration.
107 Figure 2 49. Stacking the segmentation results after repeated application of the ASF algorithm on an image of the cytoskeleton (top left) using different rescaling values generates a sharpened image (top right), which can be segmented again and skeletonized (bottom) to retrieve the coordinates of fibers in the image.
108 Figure 2 50. Stacking the segmentation results after repeated application of the ASF algorithm on an image of the actin cytoskeleton (top) using differ ent rescaling values generates a sharpened image, which can be segmented again and skeletonized (bottom) to retrieve the coordinates of fibers in the image. Data from other fluorescence channels (middle) can be combined with fiber localization to co locali ze structures and protein activity.
109 CHAPTER 3 QUANTIFICATION OF CELL SHAPE BY DASF IMAGE PROCESSING The dinosort de noising algorithm and the ASF algorithm introduced in the previous chapter are hereafter referred to as DASF, which stands for dinosort assisted automated spatial filtering . Segmentation results using DASF where always verified for accuracy by visual assessment even though it is an automated process. The enhanced accuracy of bou ndaries generated using this approach , grants the ability to make more resolved measurements of cell morphology with more power to quantitatively classify cell behavior than previously existed . This chapter will discuss cell displacement, and demonstrate how DASF greatly improves image based measurements t hat quantify cell shape . The chapter ends with the consideration of cell boundaries as points on a shape manifold. Projecting cell boundaries onto manifolds provides excellent clustering capability to cla ssify cell shapes, and also provides a means to reconstruct a mean shape for a collection of cell configurations , which is impossible to do using traditional cell parameters. Cell Displacement Migration and displacement are often used interchangeably in t he biomedical engineering language , although cell migration is much more difficult to actually describe. This section focuses on the displacement of cells, which is measured by the difference in the location of a cell from one frame to another during a tra cking experiment, and does not fully r eflect the migration of the cell . Centroids Once the cell signal is segmented from a raw digital image, it can be represented as a vector of x and y coordinates in the space of the image. A single dot in the top left
110 c orner of an image would have the coordinates of (1, 1), for instance. The centroid of a cell is classically defined as the mean of the x and y coordinate vectors: and, where N represents the total number of pixels comprising the segmentation result. However, t here are other options available. An intensity weighted centroid , also uses the coordinates of the segmentation result, but includes the intensities of the raw image as weights: and, Weighting the intensity calculation opens the door to many other weighting choices, which are not explored here. However , it is worthy to mention that the average coordinate values of the boundary pixels will differ from the classical centroid calculat ion since protrusions away from the cell body will require more pixels along the boundary to represent them, while the bulk of the cell body weighs heavily in the classical centroid calculation. Another consideration about classic centroids is the dependen ce on object orientation on the measurement of displacements. Figure 3 1 uses tetris style block pieces to demonstrate the variation introduced into the measurement of displacement as a new block is added, which is dependent on the orientation of each piec e. Adding a
111 block along the major axis of objects results in a greater displacement measurement, for example. This is critical to the measurement of cell displacement since live cells often exhibit several configurations as they move. Using only the trajec tory formed by successive centroid measurements overlooks the influence of cell shape and might introduce variation in the step size of a cell. Protrusions that dictate changes in the cell centroid may be equal in magnitude, but would be interpreted differ ently based on where they form around the cell. For this reason, other measures of the centroid may be preferred. The Distance T ransform The weight of protrusions on centroid calculations can be avoided by tracking the centroid of a maximally inscribed cir cle within the cell boundary. A circumscribed circle may also be tracked, and is for many particle tracking applications, but the motivation to omit the influence of variable protrusions is cause to avoid circumscribed circles. An easy method to find the m aximally inscribed circle is to perform a distance transform on the cell boundary. Representing boundary coordinates as , we have: where forms a matrix of distance s between each pair of boundary pixels and overall segmented pixels. The minimum value of each column in can be found, and the maximum element in that list indicates the th coordinate of the segmentation pixels that correspond s to the centroid of the maximally inscribed circle. The radius of the circle is, then, that maximal element distance value.
112 Princip al Component A nalysis Cells are more rounded when they are preparing for division or dead, but most of the time they exhibit some degree of elongation, if not polarity or more complex shapes. Maximally inscribed circles do not necessarily reflect this elongation in the morphology of live cells, so a maximally inscribed ellipse is suggested to track the cell body. Calculating an inscribed ellipse is the same as calculatin g an inscribed circle , but the segmentation coordinates must first be de elongated. Principal component analysis (PCA) is an excellent statistical tool, but in this case it will be used to rotate and scale the coordinates of segmented cells. Figure 3 2 ill ustrates the key steps in this process, and more detail is found below. MATLAB has a built in function to perform PCA that returns the eigenvectors and eigenvalues of a set of boundary coordinates when they are input as column vectors. The eigenvectors for m a 2x2 matrix that acts as a rotation matrix, which will rotate the cell boundary so that its major axis falls along the x axis after centering the coordinates on their classical centroid. The degree of rotation corresponds to the inverse cos ine of the fi rst element in the eigenvalue matrix. Once the coordinates are rotated, the new x and y vectors can be scaled by dividing by the square root of the PCA eigenvalues, where the larger eigenvalue square root should divide the new x coordinate. Once this is do ne for the boundary and overall segmentation coordinates using the same set of PCA results, the maximally inscribed circle may be calculated for the rotated and rescaled coordinates as normal. The centroid of the maximal ellipse must be added to the classi cal centroid to get its position in the original space of the image, and its radius must be scaled back by the square root of the eigenvalues and rotated into the original cell orientation as well.
113 Protrusion A nalysis Obtaining the maximally inscribed ell ipse offers other advantages than tracking displacement of the bulk cell. By overlapping the ellipse on the cell boundary, regions of the segmentation result are distinguishable as front or rear end protrusions. The displacement of protrusions will not nec essarily influence the cell body to significantly displace, yet this is often overlooked by classical centroid analysis. Figure 3 3 depicts how t he subdued motion of a maximally inscribed ellipse can be paired with the displacement of each protrusion to un derstand the dynamics of the cell as it moves along using block pieces as models . Figure 3 4 features several examples of NIN 3T3 fibroblasts, which have been partitioned using a maximally inscribed ellipse. The ability to locate the protrusion displacement opens the door to count and classify cell protrusions, but may also be used to improve upon previous cell nucleus co localization studies. Early concepts of this work supposed that the relationship between the cell and nucleus centroid displacement might grant more detailed information about cell migration. Unfortunately, the orientation of the cell can drastically affect the measured displacement of the classical centroid, introducing a high amount of error on the magnitude of the me asured displacements themselves for time scales of one minute. Inserting a maximal ellipse to segregate the cell body and protrusions offers two new alternatives to such co localization work: tracking the co localization of the n ucleus and maximal ellipse displacement, and tracking the co localization of the nucleus with the nearest dynamic cell protrusion. For cells exhibiting polarity as they move, the nucleus will ideally be co localized to the rear of the cell, near to traili ng end detachments. There are several physical
114 mechanisms that can act to link the nucleus to the rearward cell substrate adhesions including the LINC complex, actin myosin fibers, and matured focal adhesions. These type of role for the nucleus, which is a much more rigid presence inside the cell than the cytoskeletal matrix. Internal structural proteins such as lamins that are physically connected to this process might also contribute to the flattening of the nucleu s perpendicular to the substrate surface in a manner similar to tightening shoelaces once external actin myosin contractions are triggered to de adhere a major trailing end adhesion. For less severe detachment, the nucleus might exhibit at least some degre e of correlated movement with the trailing end protrusion, and is expected to lag behind the cell body based on the same connectivity concept, which could explain the polarized phenotype in a wide variety of cells. However, these ideas were outside the sco pe of cell shape and have not been explored in detail to date. Cell Shape Extending from the ability to partition a segmentation result into the cell body and its surrounding protrusions, the overall cell shape is expected to offer more insight into comple x migration behavior than simple displacement analyses. By definition, the shape of the cell does not rely on the position or orientation of the cell within an image frame, nor does it rely on the scale of the segmentation result. However, some descriptors of shape are coupled to the characteristic lengths and widths of the detected cell, and they have offered the means to classify different cells by such parameters. The developing field of computational diagnostic aid and morphometric analyses used in the drug discovery industry sometimes incorporate image parameters in statistical analyses to predict disease states or drug effectiveness. These additional parameters are all
115 coupled to the scale of cells in the field of view, and might not grant extra statis tical resolution, but rather, they could wash out important observations with large datasets. In an effort to better represent the physical configuration of live, moving cells, this section will discuss some options for quantitatively characterizing cell s hape and scale. As further motivation for the analysis of cell shape, Figure 3 5 revisits the example of simple blocks, classifying them by their shape using measures of circularity and eccentricity, which will be discussed shortly. In the figure, the disp lacement of blocks measured by their classical centroid varies, but in the framework of shape analysis, each block configuration is distinguishable. Circularity Very basic cell parameters include the area and perimeter length of the cell. The full resoluti on of the boundary is hindered by the nature of imaging with pixels, but cells are typically large enough to adequately quantify these parameters by simply counting the number of coordinate pairs in the segmented image result. The area is related to the ce ntroid, then, as it can be considered as the 0 th moment of the pixel location distribution, whereas the classical centroid is the 1 st moment. It is important to re emphasize the importance of segmentation in obtaining this type of cell parameter. Figure 3 6 demonstrates how the distribution of measured cell area can dramatically change depending on segmentation results. The accurate cell boundaries provided by the DASF algorithm introduced in this work delivers the enhanced ability to measure cell parameter s that was not available previously in an automated format. The area has the dimension of length squared , and cells with a larger radius will exhibit markedly larger areas by image analysis. For this reason, the square root area might prevent skewing of statistical analysis result from PCA or other tools. The
116 perimeter length has the dimension of length, gra nting the opportunity to create a dimensionless variable comprised of some ratio between area and perimeter. Several choices exist to combine area and perimeter measurements. One such measurement calculates the perimeter of a circle with the equivalent are a as the cell, then takes the ratio of cell perimeter to circle perimeter. Another choice is to simply divide the squared perimeter by the area measurement to create a dimensionless shape parameter known as the shape factor, form factor or circularity: where and represent the circularity, perimeter and area measurements, respectively. Despite the dimension of the area, the circularity turns out to be proportional to the perimeter perimeter dimensionless parameter mentioned previously, an d was chosen as one variable to measure the shape of cells for this work. Figure 3 7 contains a diagram with the circularity measurements of common shapes for additional reference. Eccentricity Some other basic cell parameters are the major and minor axes of the cell. Both of these measurements have the dimension of length and their ratio naturally forms another dimensionless variable known as the length to width ratio, or the eccentricity: where and represent the eccentricity, major axis and minor axis, respectively. The morphology of cells is rarely so simple as to facilitate the easy calculation of these axes, and so a typical solution is to circumscribe an ellipse about the cell and take the maj or and minor ellipse axes as the cell major and minor axes.
117 Here, a different way to calculate the eccentricity is suggested using PCA to obtain the eigenvalues of the coordinate pairs of the cell boundary. The square root of the ratio between these eigenv alues is equivalent to the traditional method to calculate eccentricity in that it represents the standard deviation of the coordinate point cloud along its primary axes, which can be considered to form an ellipse around the coordinate distribution: where and are the larger and smaller eigenvalues of the boundary coordinates. Regardless of the choice of eccentricity calculation, this measurement has the ability to classify a large range of shapes, seen already in Figure 3 5. However, i t is essential to remember how important segmentation is to these type of classification strategies. Figure 3 8 compares the distribution of the circularity and eccentricity for the many configurations of a single migrating NIH 3T3 fibroblast. The accurate segmentation of the DASF algorithm reveals a representative shape distribution, whereas less accurate segmentation methods can generate significantly skewed shape distributions, misrepresenting the actual shape of the cell. Symmetry The symmetry of cells is often observed either while they are moving, stationary or dividing, but the symmetry of the complex closed curves of cells can be difficult to assess. Rounded cells and those with isotropically distributed protrusions might have some degree of radial s ymmetry, much like a starfish, but the calculation of this type of symmetry is less physiologically relevant upon consideration of the anisotropic
118 distribution of cytoskeletal architecture that accompanies cell migration. Hence, the axial symmetry along th e major and minor axes is of primary interest. The method to calculate a degree of symmetry again relies on PCA to return the 2x2 eigenvector matrix. This rotation matrix can be right multiplied to the centroid centered boundary coordinates to orient the m ajor and minor axes of the cell on the x and y axes, respectively. The new coordinate vectors adopt a form that is conducive to symmetry calculations because values on either side of the major and minor axes now are of opposite sign, so that: and, where represents the symmetry parameter for a given axis and represents the number of coordinates which satisfy the condition of their subscript. Both x and y coordinates that may be equal to 0 ar e omitted from the calculation since they would represent perfect symmetry along an axis and count for neither side. Since represents a portion of the cell perimeter, it has the dimension of length, and the symmetry measurement is dimensionless. Figures 3 9 demonstrates the symmetry of NIH 3T3 fibroblasts against those treated with TIAM1 shRNA. The data indicate that treated cells exhibit less minor axis symmetry , which represents a more polarized cell. This is supported by the data in Figure 3 10, which indicate that the average distance between the cell centroid and the centroid of a maximally inscribed ellipse as a percentage of the major axis length will
119 decrease with treatment by shRNA from nearly 8% to 6.5%. As the cell area becomes disproportional due to polarity, the centroid calculation will favor the side of the cell represented by more pixels, which also corresponds to the location of the inscribed ellipse. Measuring C hanges in C ell Shape The ability to accurately measure cell shape parameters g rants the ability to investigate the physical effects of drug treatment or protein expression in cells. Figure 3 11 demonstrates some early shape data from tracking single NIH 3T3 fibroblasts subjected to miR 34 a treatment, and the transient transfection of the plasmids for three GTPase: mutations Rac V 12 (constitutively active), RhoA T19 (dominant negative) , and CDC 42 T17 (dominant negative) . The circularity and eccentricity of single cells were recorded over the course of one hour in minute increments, then used to generate 2 D distribution patterns. The treatment by miR 34a influences cells to exhibit more complex boundaries and to become elongated, as indicated by more distribution density at higher circularity and eccentricity values in comparison to the control . Due to the mechanism of micro RNA activity in cells, the exact target s of attenuated protein expression are unclear, but it is believed to affect cell migration. Since the GTPases dictate many aspects of cytoskeletal reorganization in migrati ng fibroblasts, comparison of cell shape parameters for miRNA treated and GTPase transfected cells was attempted to gain physical insight into possibly links between these factors. Transient transfection of the constitutively active form of RAC1 influences the fibroblasts to become very rounded and less elongated than control cells. This might be explained by the enhanced branching of the actin network in forming protrusions, through the activity of Arp 2/3, which is downstream of RAC1 activity. However, th is cell
120 shape distribution is drastically different than the miR 34a distribution, suggesting that the micro RNA targets do not affect the upregulation of RAC1 in NIH 3T3 cells. Similarly, the introduction of dominantly negative CDC42 caused cells to exhib it less elongation and only slightly more boundary complexity than the control, but these shape changes were not consistent with those from miR 34a treatment. CDC42 is considered to be a major regulator of cell polarity, which is supported by the observati on of less elongated fibroblasts. Finally, the introduction of dominant negative RhoA caused fibroblasts to elongate and exhibit more circularity in a similar manner as those treated by miR 34a. Loss of normal RhoA activity is expected to decrease the leve l of actin myosin contractility at the trailing end of polarized migrating fibroblasts. This could leave adhesions in the trailing end of cells in place, stretching the cell as it moves away, which is supported by the increased eccentricity of the observed cells. Based upon the similarities in the cell shape data between RhoA T 19 and miR 34a treatment, a link might exist between these two factors, but more data from traditional migration and molecular based studies is necessary to make a stronger conclusio n. After examining single cells, it was realized that s hape parameter distributions for single cells might not reflect the distribution of the entire population. Figure 3 12 demonstrates the circularity versus eccentricity distributions for an epithelial cell line, OSE10, and for NIH 3T3 fibroblasts treated by TIAM1 shRNA or by the diabetes drug, Metformin. Figure 3 13 contains 3 plots of the circularity versus ec centricity for human foreskin fibroblasts (HFF), OSE10 and NIH 3T3 cells using a larger population of single cells to refine the distributions. These data indicate some differences in the shape of cells for each condition or cell line, but it becomes clear that cell shapes might become
121 difficult to classify other than by their rounded ness, elongation or complexity. Figure 3 1 4 is a plot of circularity versus eccentricity for a control population of NIH 3T3 fibroblasts, with the boundary of the cell drawn i nstead of single points. The three colored circles indicate the general classifications of shape that seem to emerge initially from shape analysis. Although more resolution might be available through statistics, it can be seen from Figure 3 1 5 that the res olution between clusters of points can differ between cell types, obscuring the ability to classify very rounded cell types with the same detail as more complicated morphologies . The lack of shape resolution becomes more important when considering differen ces in cell boundaries arising from molecular perturbations, which could introduce differences too subtle to detect between populations. NIH 3T3 were treated by transient transfection of plasmids for constitutively active or dominantly negative forms of Rh oA (V14 or T19N) , Rac1 (V12 or T17N) and CDC42 (V12 or T17N) proteins obtained as a gift from Alan Hall. The circularity versus eccentricity plots for these cells are found in Figure 3 1 6 . Compared to a control group of NIH 3T3 cells, the only treatment condition that does not exhibit a large proportion of rounded cells, indicated by a circularity of 12 and eccentricity of 1, was that of the dominant negative RhoA group. Physically, these c ells lacked the ability to pull out trailing end substrate adhesions due to the diminished actin myosin contractility caused by interference with normal RhoA activity. They appeared to have significantly elongated and thin protrusions, resembling in appear ance the axons of a neuron. Since the ability to create front end protrusions was not impaired, this group of cells would stretch along their direction of displacement, pulling old protrusions out like taffy. This morphology differs
122 significantly from any seen in the control group, a detail that is lost when qualitatively comparing the two distributions of shape parameters. Conversely, the active form of RhoA induced very rounded cells, possibly due to very strong substrate attachment or negative feedback o n normal Rac1 and CDC42 migration activity, diminishing the formation of new protrusions. This morphology is very similar in distribution to OSE10 epithelial cells (see Figure 3 13) , but the underlying molecular activity that induced each morphology is not expected to be similar. However, similarities seen between active and dominantly negative forms of Rac1 and CDC42 might imply underlying molecular pathway similarities since these two proteins share a signaling pathway that influences actin polymerization at the front end of migrating fibroblasts. Reducing that activity induces rounded, immobile cells, while bolstering polymerization of actin over creates NIH 3T3 protrusions, inducing complex single cell morphology. Statistical analysis of the circularity , eccentricity and number of protrusions for each of the above conditions was performed by analysis of variance (ANOVA) to determine significant differences from the control group of NIH 3T3 cells. The ANOVA results are featured in Figure 3 17, but lack th e significance one might expect to see given the large qualitative differences seen in the circularity versus eccentricity plots. These data imply that quantification by the circularity and eccentricity are insufficient to fully characterize the morphology of single cells on their own, but it may be possible to combine existing molecular techniques with live cell tracking to obtain deeper insight into cell migration and heterogeneity. Still, the problem may lie in the type of quantification typically used t o characterize cell shape in the first place. Therefore,
123 another method to quantify the cell shape was pursued in an effort to distinguish between cell configurations in a more physiologically meaningful way . Manifolds and Geodesics The dimensionless cell parameters described in the previous section have the ability to roughly classify cell shape, b ut problems arise due to the nature of these measurements. Figure 3 1 8 illustrates the variety of configurations a single cell may adopt as it moves using a circ ularity versus eccentricity plot. One issue is the convergence of points as cells become more circular and less complex. Very subtle changes in the boundary shape may also get clustered together by these measurements, such as the retraction of a trailing e nd protrusion that could enhance the shape parameters, it is also impossible to reconstruct a mean shape based on the mean values for the cluster. Thus, a new represent ation of cell shape was integrated into this work, which projects the closed planar curves of cell boundaries onto a manifold. A manifold is a topological space that can be represented in Cartesian space for a local region of points (Do Carmo, 1976; Woods, 2003; Klassen et al., 2004) . For instance, the surface of the Earth can be considered to be a manifold since its curvature prevents travel from New York to Los Angeles without burrowing through the ground, but in the local region of a kitchen, travel from the refrigerator to the stove is possible. Thus, travel along the Earth for longer distances can be achieved with a succession of short paths in a space tangent to the surface. These paths are known as geodesics and llows for it to be differentiable to obtain a tangent space in which to calculate a geodesic. The shortest geodesic paths for a flight between
124 two cities will appear as a curved line on an atlas of the globe, and similarly finding the shortest geodesic pat h on other manifolds is nontrivial due to the curvature of the manifold space. The existence of a tangent space, however, allows for the calculation of an inner product for vector fields represented as points along a manifold, allowing the measurement of d istances and angles (Do Carmo, 1976) . Manifolds with these properties are known as Riemannian manifolds, and they are the final goal when attempting to represent cell boundaries as a manifold. Projections onto Shape S pace The boundary coordinate vectors th at describe the perimeter of cells can be considered as arclength functions for closed curves in two dimensional space (Klassen, 2004; Srivastava, 2005; Liu, 2010) . The velocity of the arclength function represents the direction and magnitude that must be traveled from point to point along the curve to the next. The vector of velocities for a given curve make a positive angle with the x axis that can be recorded to create a direction function. The direction function is simply a record of the direction the c urve turns at each point to the next. In this form, closed curves lose their location within the space of the original image, and after normalizing the velocity function by its Frobenius norm, the sense of scale is also abandoned. In shape analysis, locati on and scale are considered to be nuisance factors along with the orientation of the curve with respect to the x axis. Rigid rotations in the plane of an image of the same curve must be considered to fully decouple all nuisance variables from the direction function, and represent curves in a Riemannian manifold. Adding a constant to each element of the direction function results in a rigid rotation of the representative curve in the original image space. Thus, a direction function can be enforced to satisfy the following constraint:
125 where , and is the first component of a map of the direction function into . Two more components arise from the closure condition of the curve: and, where . This mapping was first reported by Klassen, who also suggested an iterative algorithm to project direction functions onto the tangent space under the mapping constraints using the Jacobian matrix: where indicate s the inner product. In practice, the rotation constraint is handled separately from the clos ure condition, and arclength functions are projected to the tangent space using only the closed curve constraints, and the n those vectors are rotated to match each other. Geodesic D istance and Boundary C urvature The projection of direction functions onto the tangent space of a shape manifold has a strong association with the curvature of the cell boundary: Because the tangent space is calculated from the differential of the map from the shape space, geodesic paths strongly reflect changes in curvature necessary to
126 transform one shape into a nother. This can be visualized by considering a wire coat hanger that must be bent and shaped to create a rod for cooking hot dogs over a camp fire. There will be a minimum number of bends such that the wire is straightened with the least amount of energy, and this is representative of the shortest geodesic path. Figure 3 1 9 features a paper box, on which a shortest path from one face to another has been drawn. If the paper box is flattened, that shortest path becomes a straight line, which is representativ e of how the tangent space of the shape manifold is used. The minimal geodesic distance between two tangent vectors can be calculated by their inner product, under the assumption that they are near enough on the manifold to form a straight path in the tangent space. The shapes of cells do not change dramatically enough to break this assumption as they move, but a very heterogeneous population of cells might . However, very basic heuristics about the shapes of cells in a population can help to prevent comparisons between very different shapes. This is discussed furt her in the next section. Shape C lustering Just as the distance transform described previously generates a matrix of absolute distances between two sets of coordinates, a collection of distances between each pair of shapes in a set can be constructed. This forms a square matrix with a distance of zero for all diagonal elements. An example of a geodesic distance matrix is found in Figure 3 20 , constructed by the single cell shapes in Figure 3 1 8 . Srivastava suggested a Markov Chain Monte Carlo (MCMC) search p rocess to cluster the data contained in the geodesic distance matrix, which relied on the calculation of a dispersion for each cluster and iteratively moved a random element to a new cluster based on an exponential probability function. The clustering conf iguration that
127 minimizes the dispersion of each cluster represented the final result, but for a larger data set of cell shapes, the MCMC method required a prohibitively longer time to settle to an answer. Motivated to be able to cluster cell shapes effecti vely using geodesic distances, normalized graph cuts were revisited as an alternative to the MCMC search algorithm (Cour, 2005) . The previous disadvantage of large similarity matrices for the graph cut approach to image segmentation stemmed from the conver sion of an by sized image into an x sized similarity matrix. However, the geodesic distance matrix is already conducive to the form of an imaging similarity matrix. The only necessary step is to convert the geodesic distances by a Gaussian function following: where is the resulting similarity matrix and is th e geodesic distance matrix. Figure 3 21 demonstrates the similarity matrix of the geodesics distance matrix in Figure 3 20 . The first eigenvectors of are determined , identified by the largest eigenvalues of , and where is the eventual number of clusters desire d . Next, these colu mn eigenvectors are normalized and the largest element in each row is identified and set as the initial cluster result. This initial clustering is left multiplied by the normalized eigenvectors and the result is subjected to singular value decomposition to create a rotation matrix from the resulting two unitary matrices. Right multiplying the rotation matrix with the eignevectors changes their direction in space and maximizes the singular values upon recalculation. The position of the new largest element in each row of the rotated eigenvectors defines the clustering result. This process is orders of magnitude faster than MCMC searching, and clusters the shapes of cells extremely well
128 based on their geodesic distances from each other , as seen in Figure 3 22 . Figure 3 2 3 shows the clustering result in the context of circularity and eccentricity, demonstrating the statistical resolution the shape manifold approach provides over traditional parameters for classifying single cell shapes. This is the first applica tion of normalized graph cuts to clustering based upon geodesic distances. Mean Cell S hape The vectors in the tangent space of a given direction function and the geodesic distances between shapes in the manifold can be used toget her to also specify a mean of a set of shapes. The mean in terms of the manifold is known as the Karcher mean, or an intrinsic mean (Woods, 2003; Klassen, 2004) . The mean of traditional cellular parameters is called an extrinsic mean, and since the Karcher mean is defined on a manif old and extrinsic means are not, the manifold approach to mean shapes allows for the reconstruction of a curve whereas extrinsic means cannot. However, since points on a manifold lie in a curved sp ace, averaging over all points does not provide an intrinsi c mean. Instead, the gradient descent approach introduced by Woods must be implemented , which takes successively smaller steps from a beginning point sequentially toward each point in the set of shapes, and with the length of each step determined by the ge odesic distance between points. An example of this method to calculate the mean in featured in Figure 3 2 4 . For tangent space, the result of each step is projected back to the manifold, and the resulting final location will represent the Karcher mean shape . Using the same clustering result from Figure 3 2 5 , the mean shape of each cluster is shown in relationship to the cell trajectory in Figure 3 2 3 . Based on this result, the potential to quantitatively characterize subtle differences in cell motion becomes available for the first time .
129 Summary The geodesic lengths which characterize curvature differences of closed curves are measured along the tangent space, and the direction of the tangent along minimal geodesics can reflect very subtle changes in the cell boundary. A basis of tangents could conceptually be formed to classify the distribution of protrusion types for normal and diseased cell types, and would provide a quantitative basis to compare the shapes of cells and protrusions in a formal framework. A distribution of protrusion types can also identify the frequency of dangerous morphologies within a biopsy sample or reveal very subtle variation for drug screening applications. This manifold based approach to cell shape has the added advantages of provid ing meaningful mean shapes and allowing efficient clustering of shapes. A disadvantage of this approach is the parameterization of the closed curves, which describe cell boundaries. As the computer detects objects and generates the coordinates of boundarie s, it discovers the pixel nearest to the top left of images first. This beginning point for the curve can negatively affect the comparison of shapes and the calculation of geodesic distances. In this work, each curve was reparameterized to begin from the l eftmost pixel after rotation to the major axis. This is not an ideal solution to the problem, but it is fast in comparison to cycling through all possible starting points when comparing two shapes, as proposed by Klassen. Since cell boundaries most always vary in their perimeter, cubic spline interpolation was introduced to regularize the length of perimeter vectors to 200 points. This process could introduce some subtle variation at very similar boundaries that might trans late into the geodesic distance between two curves. However, it was observed that higher curvature regions of boundaries tended to dominate the geodesic distance
130 measurement. This seemed to foster the clustering of single migrating cell shapes, but when en tire populations of cells where clustered, the grouping often contained very differently shaped cells (data not shown) and was unsuccessful. The poor grouping of cell populations might be due to similarities in high curvature regions of the cell boundary, or due to the cells being too distant from each other along the manifold, a testament to the heterogeneity of cells.
131 Figure 3 1. The displacement between centroids of block pieces are measured for the addition of a square block both vertically and dia gonally. The displacement is different depending on the original orientation of the block piece, despite the equal addition to each piece.
132 Figure 3 2. Principal component analysis is used to inscribe a maximal ellipse into a cell boundary. The eigenvectors of the boundary coordinates form a rotation matrix that rotates the boundary, aligning the major axis with the x axis. The maximal distance of the area coordinates and boundary indicates the center of the ellipse, and is calculated by a distan ce transform after normalizing the coordinates by their eigenvalues. Figure 3 3. Inscribing a maximal ellipse partitions shapes and can help identify the number of protrusions as well as aid tracking by centroids of the partitions.
133 Figure 3 4. 16 NIH 3T3 fibroblasts have been partitioned by their maximally inscribed ellipse, demonstrating the range of protrusions this cell type can achieve. Figure 3 5. Using additional shape information may help to understand the behavior of migrating cells in addi tion to displacement information.
134 Figure 3 6. Segmentation of a cell (top) performed by simple thresholding using noise boundaries, which can drastically skew the distributi on of cell parameters such as area (bottom). The black arrow indicates the accurate area distribution determined by the DASF algorithm.
135 Figure 3 7 . The circularity for basic shapes is presented.
136 Figure 3 8 . Cell shape parameter distributions are sig nificantly skewed by bad image segmentation, even for visually acceptable results. The DASF algorithm produces accurate segmentation and delivers more accurate cell shape parameters, as demonstrated using the circularity and eccentricity.
137 Figure 3 9 . Th e symmetry differences of NIH 3T3 fibroblasts (top) and those treated with TIAM1 shRNA (bottom) are demonstrated using 2 d contour plots of the distributions. Treatment by shRNA influenced a decrease in symmetry.
138 Figure 3 10 . The distance between the cell centroid and the centroid of a maximally inscribed ellipse was normalized by the major axis length and used to construct two histograms for NIH 3T3 fibroblasts and those treated by TIAM1 shRNA. The decrease of this distance in treated cells indicates more polarized, less symmetric cells.
139 Figure 3 1 1 . The distribution of circularity and eccentricity for single migrating NIH 3T3 fibroblasts changes upon treatment by miR 34a, and plasmids for Rac1 V12, RhoA T19 or Cdc42 T17.
140 Figure 3 1 2 . Circularity and Eccentricity can generally classify differences in cell populations based on cell type or after drug treatment.
141 Figure 3 13. Circularity and Eccentricity can generally classify differences in cell populations based on cell type.
142 Figure 3 1 4 . The three colored circles indicate rounded (orange), elongated (blue) or complex (purple) cells. These divisions do not take advantage of the complete information a set of boundary coordinates provides. Figure 3 1 5 . Very rounded ce ll types may be very difficult to classify using circularity and eccentricity, causing a bias in the ability to analyze their behavior versus other cell types. OVCAR 3 epithelial cells are used in this example against Swiss 3T3 fibroblasts.
143 Figure 3 16. Circularity and Eccentricity can generally classify differences in cell populations based on differences in the protein activity of key cytoskeletal proteins .
144 Figure 3 17. ANOVA of the circularity, eccentricity and number of protrusions for several cell types and treatment conditions do not reveal the expected significant differences noted visually between single cells from each group.
145 Figure 3 1 8 . A live NIH 3T3 fibroblast adopts many configurations as it moves along a trajectory, but they are vi sually similar and their shape parameters do not reflect differences between subtle shapes changes.
146 Figure 3 1 9 . Minimal geodesics represent the shortest paths in the curved space of a manifold. The tangent space near a point on the manifold resembles flat Cartesian space and is useful for calculating geodesic paths.
147 Figure 3 20 . Each pixel represents the minimal geodesic distance between two cell shapes, with the diagonal representing self comparisons. These distances represent the change in total boundary curvature between one shape and another. Brighter pixels represent larger distances.
148 Figure 3 2 1 . The geodesic distance matrix can be transformed to a similarity matrix for use in the normalized graph cut clustering algorithm using the Gaussian function. Brighter pixels represent similar shapes .
149 Figure 3 2 2 . Three clusters (right) are rapidly computed using normalized graph cuts for a 183 x 183 similarity matrix (left).
150 Figure 3 2 3 . The clustering (blue, green or red) result based on geodesic distances provides greater resolution than traditional shape parameters, even for very subtle differences in the cell boundary. The clustering result also suggests that the trajectory of a single NIH 3T3 may be correlated to the current cell c onfiguration.
151 Figure 3 2 4 . The calculation of a mean shape using geodesic distances and tangent vectors functions identically to this example using 16 random points. Beginning from an arbitrary point, a partial step is taken toward the next point d etermined by 1/n, where n is increased from two to the total number of points minus one at each step.
152 Figure 3 2 5 . The mean shape of each cluster can be calculated using the shape manifold to provide potential insight into cell migration behavior.
153 CHAPTER 4 CONCLUSIONS AND R ECOMMENDATIONS This dissertation presents two new image processing algorithms to enhance cell mechanical studies: the dinosort algorithm and automated spatial filtering. The combination of the two, termed DASF, provides high fidelity measurements of cell parameters based on accurate image segmentation, while ASF alone is used to dramatically impro ve the performance of CT image segmentation and the detection of cytoskeletal fluorescent signals . T he ability to distinguish betwee n very subtle differences in cell shape is granted, both by traditional parameters and by projecting the closed curves of cell boundaries onto a shape manifol d, and the physiological effects of drug treatment, protein mutation or microRNA are able to be qu antified through simple observation of cell behavior via common fluorescence microscopy. The state of the art in image processing is often far removed from practical daily work in cell biology. Images are taken, but only to qualitatively characterize a sam ple or verify the result of some molecular based assay. Cell migration studies generally rely on parameters that reflect displacement of a roughly estimated cell centroid rather than representing the dynamics of cell motion. Yet, the dynamics of the cytosk eleton that must occur to achieve migration are a window into the complex signaling pathways underlying cell migration events. Thus, extraction of more representative cell parameters from digital images has the potential to reveal a deep story about cell m echanics and the response of a cell to a stimuli, chemical or physical. The DASF algorithm has provided the foundation for this type of cell mechanics investigation, and it has done so in a rapid, accurate, automated format.
154 The dinosort and ASF methods d o not require in depth knowledge of mathematics to implement independently, fostering availability to a larger biological audience, but also granting easy integration into openly available image processing suites like ImageJ. Automated segmentation of cell images also provides consistent measurements that are repeatable regardless of training or experience. This quality fosters the sharing of data in a standard format that can be trusted. Moreover, the concept of high content screening must rely on very rap id analysis over sophisticated segmentation schemes that increase processing time. The loss of segmentation quality can become expensive as greater sample size is necessary for statistical analysis. Yet, the distributions of cell parameters measured from p oor segmentation can misrepresent the true cell configuration. DASF offers equivalent speed as very basic segmentation schemes as well as the accuracy of more sophisticated algorithms, and could potentially save time and money invested in additional sample size, reducing the cost of drug discovery screening and thereby reducing the cost of medication to consumers. Medical diagnostics can also benefit from accurate, repeatable image analysis. Human error is costly due to litigation and wasted resources on ex traneous testing, which has inspired the development of computer aided pathology. While the spatial frequency concepts underlying DASF and the improved ability of level sets to segment CT scan images should be pursued further in more medical applications, the analysis of cell shapes and organ shapes could be crucial to detecting image subtleties a physician might miss through subjective analysis. Shape manifolds are being explored to assess MRI scan data, but this dissertation is the first to apply the conc ept of shape manifolds to study cell boundaries. Due to the advantages of the manifold representation of cell
155 shapes and the potential to improve medical image segmentation through DASF, this work represents a major multidisciplinary success. However, a ge neral class of computer vision problems could also potentially benefit from this work, including military targeting, object recognition, disaster relief and surveillance. Missiles must be achieve a lightweight intelligence to determine whether to commit to a target and avoid devastating non military bystanders. Simple classifications of shapes could help to achieve the proper discrimination without overwhelming limited processors in those devices. Object recognition by procrustean analysis, or direct compar ison, is computationally expensive , but certain objects feature unique shape parameters that could be exploited. Since objects visible from a moving device will change location within the field of view, change scale and possible change orientation, shape a nalysis is ideally suited to provide targeting information. In all of these applications, image segmentation and shape analysis is critical to success. As a final thought, the planar curves of cell boundaries give rise to many types of measurements, but the intrinsic information from many measurements are spawned from locations within the two dimensional plane of the image and fundamentally share the dimension of length . Applying manifolds and geodesics to the shape analysis of cell boundaries might have had its obstacles in this work, but it also opens the door to consider the boundary of the cell as something other than a planar curve. Rather, the boundaries contains various protrusion features at certain frequencies, which is analogous to the Fourier de composition of a simple curve. Yet, the heterogeneity of cells generates a wild array of possible protrusions, and a straightforward decomposition of the entire cell boundary couples two protrusions together
156 mathematically, which may in truth arise from di stinct physiological activity. It is strongly suggested that cell boundaries be segregated by fitting an ellipse or some other means, and that the distinct protrusions identified afterwards be treated separately as open curves and subjected to shape analys is. In that scenario, high curvature trailing end protrusions would not influence the analysis of a broad flat lamellipodia at the front end of the cell. Furthermore, rounded cells may appear to have several dull protrusions, but none would be similar in s hape or number to a very polarized fibroblast protrusion. Single cells may then be classified by the protrusions they exhibit, rather than by their general shape. The implications to cytoskeletal architecture should be immediate, but the possible ramificat ions of this strategy to pathological cells could prove to elucidate many obscure cell behaviors.
157 APPENDIX METHODS Fluorescence Microscopy A Nikon TE 2000 microscope (Nikon, Melville, NY), equipped X Cite 120 PC fluorescent light source (EXFO, Ontario, C anada) and a Cascade:1K CCD camera (Roper Scientific, Tucson, AZ), was used to acquire microscopic images at 20x magnification. In addition, an on stage incubator (In Vivo Scientific, St. Louis, MO) with temperature control and a supplementary CO 2 system w as operated along with the microscope to maintain the experimental environment at 10% CO 2 and 37 Â°C. Cell Culture, Plasmids and Transfection NIH 3T3 fibroblasts (ATCC, Manassas, VA) were maintained in DMEM with 10% fetal bovine serum and 1% L glutamine (a ll purchased from Mediatech Inc., Manassas, VA) in a humidified incubator at 37 C and 10% CO 2 . For transfection, cultured cells were prepared on fibronectin coated glass bottom dishes (MatTek, Ashland, MA) under normal culturing conditions for 24 hours. Th en the cell culture was incubated in 1 ml Opti MEM I Reduced Serum Media (Invitrogen, Carlsbad, CA), which was supplemented with a premixed transfection solution, containing 4 Âµg pERFP C RS plasmid (Clontech, Mountain View, CA) and 5 Âµl Lipofectamine 2000 (Invitrogen), for 1 hour to introduce the plasmids into the cells. Afterward, the cells were then maintained in normal cell culture media for further experiments. Variable Pressure Scanning Electron Microscopy Highly resolved cell images were acquired by an S 3000N variable pressure scanning electron microscope (Hitachi, San Francisco, CA). An acceleration voltage of 5 kilovolts and 10.3 mm working distance were used. This type of scanning electron
158 microscopy (SEM) uses a low vacuum (60 Pa) to enable the i maging of samples without a special coating such as gold (Kirk et al., 2008) . Cells expressing red fluorescence protein (RFP) were passed to glass slides and allowed to adhere before being fixed in glutaraldehyde for 10 minutes, washed in phosphate buffere d saline and stored in de ionized water. Cell samples were allowed to air dry before the vpSEM experiments, and were imaged using fluorescence microscopy immediately before and after the vpSEM to allow for direct comparison of the same cells between the tw o techniques. In this case, sample shrinkage was not a concern since only the fluorescent signal was of interest; hence, special freeze drying was not used to prepare samples. Image Analysis and Threshold Determination Raw images were processed using a cus tom designed routine in Matlab (The MathWorks, Natick, MA) as described in appendix B . Transfection Cell cultures, plasmids, transfection and reagents: NIH3T3 fibroblasts were obtained from American Type Culture Collection (ATCC, Manassas, VA), ovarian cells OSE10, OVCAR3 and MSPC1 were generous gifts from Professor Ie Ming Shih (JHMI), cDNA that express green fluorescence protein (GFP) was purchased from Clontech (Mountain View, CA). Culture cells for aim 3, incl uding BE, LS174T, SW962, WM266.4, A375 and A375P will be purchased from ATCC. A357m2 cells can be obtained by overexpressing human RhoC proteins in A375 cells 25 (ATCC). cDNA construct of human RhoC can be purchased from OriGene (Rockville, MD). Transfection will be carried out using Lipofectamine 2000 (Invitrogen, Caribad CA). Hoechst 33342 can be purch ased from Sigma (S t. Louis, MO).
159 REFERENCES Aigner, K., et al. 2007. The transcription factor ZEB1 (deltaEF1) represses Plakophilin 3 during human cancer progression. FEBS Lett 581, 1617 1624. Aldridge, B., J. Burke, D. Lauffenburger, P. Sorger. 2006. Physicochemical model ling of cell signalling pathways. Nat Cell Biol 8, 1195 1203. Arthur, W., K. Burridge. 2001. RhoA inactivation by p190RhoGAP regulates cell spreading and migration by promoting membrane protrusion and polarity. Mol Biol Cell 12, 2711 2720. Bakal, C., J. Aa ch, G. Church, N. Perrimon. 2007. Quantitative morphological signatures define local signaling networks regulating cell morphology. Science 316, 1753 1756. Baranwal, S., S. Alahari. 2010. miRNA control of tumor cell invasion and metastasis. Int J Cancer 12 6, 1283 1290. Beck, A., et al. 2011. Systematic analysis of breast cancer morphology uncovers stromal features associated with survival. Sci Transl Med 3(108), 1 11. Bickle, M. 2008. High content screening: a new primary screening tool? IDrugs 11(11), 822 826. Bissel, M., M. LaBarge. 2005 . Context, tissue plasticity, and cancer: are tumor stem cells also regulated by the microenvironment? Cancer Cell 7, 17 23. Boehm, M., F. Slack. 2006. MicroRNA control of lifespan and metabolism. Cell Cycle 5, 837 840. Bra ga, V., L. Machesky, A. Hall, N. Hotchin. 1997. The small GTPases Rho and Rac are required for the establishment of cadherin dependent cell cell contacts. J Cell Biol 137, 1421 1431. Brazma, A., M. Krestyaninova, U. Sarkans. 2006. Standards for systems bio logy. Nat Rev Genet 7, 593 605. Cano, A. et al. 2000. The transcription factor snail controls epithelial mesenchymal transitions by repressing E cadherin expression. Nat Cell Biol 2, 76 83. Carleton, M., M. Cleary, P. Linsley. 2007. MicroRNAs and cell cycl e regulation. Cell Cycle 6, 2127 2132. Carpenter, A., et al. 2006. CellProfiler: image analysis software for identifying and quantifying cell phenotypes. Genome Biology 7:R100.
160 Chen, C., M. Mrksich, S. Huang, G. Whitesides, D. Ingber. 1997. Geometric contr ol of cell life and death. Science 276, 1425 1428. Chen, C., H. Li, X. Zhou, S. Wong. 2007. Graph cut based active contour for automated cellular image segmentation in high throughput RNA interference (RNAi) screening. Biomedical Imaging: From Nano to Macr o 4th IEEE International Symposium 69 72. Clayton, T., et al. 2006. Pharmaco metabonomic phenotyping and personalized drug treatment. Nature 440, 1073 1077. Collins, T. 2007. ImageJ for microscopy. BioTechniques 43(1), 25 30. Conacci Sorrell, M., et al. 20 03. Autoregulation of E cadherin expression by cadherin cadherin interactions: the roles of beta catenin signaling, Slug, and MAPK. J Cell Biol 163, 847 857. Cour, T. 2005. Spectral segmentation with multiscale graph decomposition. Computer Vision and Pattern Recognition, IEEE Computer Society Conference. 2, 1124 1131. Cox, D., D. Hinkley. 1979 . Theoretical statistics. CRC Press. Croce, C. 2009. Causes and consequences of microRNA dysregulation in cancer. Nat Rev Genet 10, 704 714. Daubechies, I. 1992 . Ten lectures on wavelets. Regional Conference Series in Applied Mathematics 61 . Do Carmo, M. 1976 . Differential geometry o f curves and surfaces. Prentice Hall, Inc. Domachuk, P., K. Tsioris, F. Omenetto, D. Kaplan. 2010. Bio microfluidics: biomaterials and biomimetic designs. Adv Mater 22, 249 260. Duncan, J. 2000. Medical image analysis: progress over two decades and the challenges ahead. IEEE Trans on Pattern Analysis and Machine Intelligence 22(1), 85 105. Dzyubachyk, O., et al. 2010. Automated analysis of time lapse fluorescence microscopy images: from live cell images to intracellular foci. Bioinformatics 26(19), 2424 2430. Eastham, A., et al. 2007. Epithelial mesenchymal transition events during human embryonic stem cell differentiation. Cancer Res 67, 11254 11262. Eisenberg, I. et al. 2007. Distinctive patterns of microRNA expression in primary muscular disorders. Proc Natl Acad Sci U S A 104, 17016 17021.
161 Ferreira, A., H. Isaacs, J. Hayflick, K. Rogers, M. Sandig. 2006. The p110delta isoform of PI3K d ifferentially regulates beta1 and beta2 integrin mediated monocyte adhesion and spreading and modulates diapedesis. Microcirculation 13, 439 456. Filipowicz, W., S. Bhattacharyya, N. Sonenberg. 2008. Mechanisms of post transcriptional regulation by microRN As: are the answers in sight? Nat Rev Genet 9, 102 114. Gallagher, I., et al. 2010. Integration of microRNA changes in vivo identifies novel molecular features of muscle insulin resistance in type 2 diabetes. Genome Med 2, 9. Ghosh, Z., B. Mallick, J. Chak rabarti. 2009. Cellular versus viral microRNAs in host virus interaction. Nucleic Acids Res 37, 1035 1048. Gillette, B., et al. 2008. In situ collagen assembly for integrating microfabricated three dimensional cell seeded matrices. Nat Mater 7, 636 640. Go din, M. et al. 2010. Using buoyant mass to measure the growth of single cells. Nat Methods 7, 387 390. Gonzalez, R., R. Woods. 2007. Digital Image Processing . Prentice Hall, third edition. Gottardo, F. et al. 2007. Micro RNA profiling in kidney and bladder cancers. Urol Oncol 25, 387 392. Gupta, G., J. Massague. 2006. Cancer metastasis: building a framework. Cell 127, 679 695. Gutschner, T., et al. 2013. The noncoding RNA MALAT1 is a critical regulator of the metastasis phenotype of lung cancer cells. Cance r Res 73, 1180 1189. Hajra, K., D. Chen, E. Fearon. 2002. The SLUG zinc finger protein represses E cadherin in breast cancer. Cancer Res 62, 1613 1618. Hall, A. 1998. Rho GTPases and the actin cytoskeleton. Science 279, 509 514. He, L. et al. 2007. A micro RNA component of the p53 tumour suppressor network. Nature 447, 1130 1134. Hebert, S. et al. 2008. Loss of microRNA cluster miR 29a/b 1 in sporadic Alzheimer's disease correlates with increased BACE1/beta secretase expression. Proc Natl Acad Sci U S A 105, 6415 6420. Hermeking, H. 2007. p53 enters the microRNA world. Cancer Cell 12, 414 418. Hoggar, S. 2006. Mathematics of Digital Images . Cambridge Press.
162 Hordijk, P., et al. 1997. Inhibition of invasion of epithelial cells by Tiam1 Rac signaling. Science 27 8, 1464 1466. Hua ng, B., H. Babcock, X. Zhuang. 2010 . Breaking the diffraction barrier: super resolution imaging of cells. Cell 143, 1047 1058. Huber, P. 1981 . Robust Statistics. New York: John Wiley and Sons. Inui, M., G. Martello, S. Piccolo. 2010. MicroRNA control of signal transduction. Nat Rev Mol Cell Biol 11, 252 263. Janes, K., et al. 2005. A systems model of signaling identifies a molecular basis set for cytokine induced apoptosis. Science 310, 1646 1653. Jiang , C., P. Xuebiao, G. Minghong. 19 93 . The use of micture models to detect effects of major genes on quantitative characters in a plant breeding experiment. Genetics 136, 383 394. Katira, P., Z. Muhammad, R. Bonnecaze. 2012 . How changes in cell mechanical properties induce cancerous behavior. Physical Review Letters 108(2). Kaverina, I., O. Krylyshkina, J. Small. 1999. Microtubule targeting of substrate contacts promotes their relaxation and dissociation. J Cell Biol 146, 1033 1044. Keren, K., Z. Pincus, G. Allen, E. Barnhart, G. Marr iott, A. Mogilner, J. Theriot. 2008 . Mechanism of shape determination in motile cells. Nature 453(22), 475 481. Khatau, S., et al. 2009. A perinuclear actin cap regulates nuclear shape. Proc Natl Acad Sci U S A 106, 19017 19022. Kim, Y., et al. 2005. Modul ating the strength of cadherin adhesion: evidence for a novel adhesion complex. J Cell Sci 118, 3883 3894. Kirk, S., J. Skepper, A. Donald. 2008. Application of environmental scanning electron microscopy to determine biological surface structure. Journal o f Microscopy 233(2), 205 224. Kitano, H. 2002. Computational systems biology. Nature 420, 206 210. Klassen, E., A. Srivastava, W. Mio, S. Joshi. 2004 . Analysis of planar shapes using geodesic paths on shape spaces. IEEE Transactions on Pattern Analysis and Machine Intelligence 26(3), 372 383. Kole, T., Y. Tseng, L. Huang, J. Katz, D. Wirtz. 2004. Rho kinase regulates the intracellular micromechanical response of adherent cells to rho activation. Mol Biol Cell 15, 3475 3484.
163 Kole, T., Y. Tseng, I. Jiang, J. Katz, D. Wirtz. 2005. Intracellular mechanics of migrating fibroblasts. Mol Biol Cell 16, 328 338. Korichneva, I., U. HÃ¤mmerling. 1999. F actin as a functional target for retro retinoids: a potential role in anhydroretinol triggered cell death. J Cell Sci 112 ( Pt 15), 2521 2528. Krichevsky, A., G. Gabriely. 2009. miR 21: a small multi faceted RNA. Journal of cellular and molecular medicine 13, 39 53. Krim, H., A. Yezzi. 2006 . Statistics and Analysis of S hapes. Boston; Basel: BirkhÃ¤user Kumar, S., V. Weaver . 2009. Mechanics, malignancy, and metastasis: the force journey of a tumor cell. Cancer Metastasis Rev 28, 113 127. Le, H., D. Kendall. 1993 . The Riemannian structure of Euclidian shape spaces: a novel environment for statistics. Annals of Statistics 21(3 ), 1225 1271. Lee, J., M. Chang, Y. Tseng, D. Wirtz. 2005. Cdc42 mediates nucleus movement and MTOC polarization in Swiss 3T3 fibroblasts under mechanical shear stress. Mol Biol Cell 16, 871 880. Lee, J., et al. 2006. Ballistic intracellular nanorheology r eveals ROCK hard cytoplasmic stiffening response to fluid flow. J Cell Sci 119, 1760 1768. Lee, J., et al. 2007. Nuclear lamin a/c deficiency induces defects in cell mechanics, polarization, and migration. Biophys J 93, 2542 2552. Li, C., et al. 2011 . A le vel set method for image segmentation in the presence of intensity inhomogeneities with application to MRI. IEEE Trans. Image Processing 20(7), 2007 2016. Lim, L., et al. 2005. Microarray analysis shows that some microRNAs downregulate large numbers of tar get mRNAs. Nature 433, 769 773. Liu, X., Y. Shi, I. Dinov, W. Mio. 2010 . A computational model of multidimensional shape. International Journal of Computer Vision 89, 69 83. Lu, J., et al. 2005. MicroRNA expression profiles classify human cancers. Nature 4 35, 834 838. Machacek, M., G. Danuser. 2006. Morphodynamic profiling of protrusion phenotypes. Biophysical Journal 90, 1439 1452. Machacek, M., et al. 2009. Coordination of Rho GTPase activities during cell protrusion. Nature 461, 99 103.
164 Malla di, R., J. Sethian, B. Vemuri. 1995 . Shape modeling with front propagation: a level set approach. Pattern Analysis and Machine Intelligence 17(2), 158 175 . Mao, L. et al. 2012. Circadian gating of epithelial to mesenchymal transition in breast cancer cells via melato nin 1820. Marr, D., E. Hildreth. 1980. Theory of Edge Detection. Proceedings of the Royal Society of London. Series B, Biological Sciences 207(1167),187 217. Mason, T., K. Ganesan, J. vanZanten, D. Wirtz, S. Kuo . 1997. Particle tracking microrheology of complex fluids. Physical Review Letters 79, 3282 3285. Meng, F. et al. 2007. MicroRNA 21 regulates expression of the PTEN tumor suppressor gene in human hepatocellular cancer. Gastroenterology 133, 647 658. Miglio re, C., S. Giordano. 2009. MiRNAs as new master players. Cell Cycle 8, 2185 2186. Miklos, G., R. Maleszka. 2004. Microarray reality checks in the context of a complex disease. Nat Biotechnol 22, 615 621. Muzzey, D., A. Oudenaarden, 2009. Quantitative time lapse fluorescence microscopy in single cells. Annu. Rev. Cell Dev. Biol. 25, 301 327. Nakaseko, Y., M. Yanagida. 2001. Cell biology. Cytoskeleton in the cell cycle. Nature 412, 291 292. Nawshad, A., D. Lagamba, A. Polad, E. Hay. 2005. Transforming growth factor beta signaling during epithelial mesenchymal transformation: implications for embryogenesis and tumor metastasis. Cells Tissues Organs 179, 11 23. Nieto, M. 2002. The snail superfamily of zinc finger transcription factors. Nat Rev Mol Cell Biol 3, 1 55 166. Nimnual, A., L. Taylor, D. Bar Sagi. 2003. Redox dependent downregulation of Rho by Rac. Nat Cell Biol 5, 236 241. Nobes, C., A. Hall. 1995. Rho, rac, and cdc42 GTPases regulate the assembly of multimolecular focal complexes associated with actin s tress fibers, lamellipodia, and filopodia. Cell 81, 53 62. O hta, Y., J. Hartwig, T. Stossel. 2006. FilGAP, a Rho and ROCK regulated GAP for Rac binds filamin A to control actin remodelling. Nat Cell Biol 8, 803 814. Omelchenko, T., J. Vasiliev, I. Gelfand , H. Feder, E. Bonder. 2002. Mechanisms of polarization of the shape of fibroblasts and epitheliocytes: Separation of the roles of microtubules and Rho dependent actin myosin contractility. Proc Natl Acad Sci U S A 99, 10452 10457.
165 Onder, T., et al. 2008. Loss of E cadherin promotes metastasis via multiple downstream transcriptional pathways. Cancer Res 68, 3645 3654. Panorchan, P., et al. 2006. Single molecule analysis of cadherin mediated cell cell adhesion. J Cell Sci 119, 66 74. Panorchan, P., J. Lee, T . Kole, Y. Tseng, D. Wirtz. 2006. Microrheology and ROCK signaling of human endothelial cells embedded in a 3D matrix. Biophys J 91, 3499 3507. Papakonstanti, E., A. Ridley, B. Vanhaesebroeck. 2007. The p110delta isoform of PI 3 kinase negatively controls RhoA and PTEN. EMBO J 26, 3050 3061. Pepperkok, R., J. Ellenberg. 2006. High throughput fluorescence microscopy for systems biology. Nat Rev Mol Cell Biol 7, 690 696. Perona, P., J. Malik. 1990. Scale Space and Edge Detection Using Anisotropic Diffusion. I EEE Trans on Pattern Analysis and Machine Intelligence 12(7), 629 939. Pfeffer, S., et al. 2004. Identification of virus encoded microRNAs. Science 304, 734 736. Poy, M., et al. 2004. A pancreatic islet specific microRNA regulates insulin secretion. Nature 432, 226 230. Prasad, S., et al. 2009. Unique microRNA profile in end stage heart failure indicates alterations in specific cardiovascular signaling networks. J Biol Chem 284, 27487 27499. Price, J., E. Hunter, D. Gough. 1996. Accuracy of least squares de signed spatial FIR filters for segmentation of images of fluorescence stained cell nuclei. Cytometry 25, 303 316. Polizio, A., et al. 2011. Heterotrimeric Gi proteins link Hedgehog signaling to activation of Rho small GTPases to promote fibroblast migratio n. J Biol Chem 286, 19589 19596. Rafelski, S., W. Mars hall. 2008. Building the cell: design principles of cellular architecture. Nat Rev Mol Cell Biol 9, 593 602. Ray, N., S. Acton, K. Ley. 2002. Tracking leukocytes in vivo with shape and size constrained active contours. IEEE Trans on Med Imaging 21(10), 1222 1235. Rid, R., N. Schiefermeier, I. Grigoriev, J. Small, I. Kaverina. 2005. The last but not the least: th e origin and significance of trailing adhesions in fibroblastic cells. Cell Motil Cytoskeleton 61, 161 171.
166 Ridley, A., 2001. Rho GTPases and cell migration. J Cell Sci 114, 2713 2722. Rimm, D. 2011. C path: a Watson like visit to the pathology lab. Sci Tr ansl Med 3(108), 1 2. Rodriguez, A. et al. 2007. Requirement of bic/microRNA 155 for normal immune function. Science 316, 608 611. Rosenfeld, N. et al. 2008. MicroRNAs accurately identify cancer tissue origin. Nat Biotechnol 26, 462 469. Rottner, K., A. Ha ll, J. Small. 1999. Interplay between Rac and Rho in the control of substrate contact dynamics. Curr Biol 9, 640 648. Ruikar, S ., D. Sachin, Doye D. 2011 Wavelet based image denoising technique. International Journal of Advanced Computer Science and Applic ations 2(3) Sahai, E., C. Marshall. 2002. RHO GTPases and cancer. Nat Rev Cancer 2, 133 142. Sahai, E., C. Marshall. 2003. Differing modes of tumour cell invasion have distinct requirements for Rho/ROCK signalling and extracellular proteolysis. Nat Cell Bi ol 5, 711 719. Samadani, R. 1989 . Changes in connectivity in active contour models. Workshop on Visual Motion. Schwartz , M. 2011 . Super Resolution microscopy: a new dimension in focal adhesions. Current Biology 21(3), R115 116. Seba stian, T., P. Klein, B. Kimia. 2003 . On aligning curves. Pattern Analysi s and Machine Intelligence 25(1) . Shah, R. 2009. Current perspectives on the Gleason grading of prostate cancer. Arch Pathol Lab Med 133, 1810 1816. Shi, J., J. Malik. 2000. Normalized cuts and image segmenta tion. IEEE Trans on Pattern Analysis and Machine Intelligence 22(8), 888 905. Siegel, R., D. Naishadham, A. Jemal. 2013. Cancer statistics, 2013. CA Cancer J Clin 63, 11 30. Sluder, G., J. Nordberg. 2007. Microscope basics. Methods in Cell Biology 81, 1 10 . Spaderna, S., et al. 2008. The transcriptional repressor ZEB1 promotes metastasis and loss of cell polarity in cancer. Cancer Res 68, 537 544. Srivastava, A., Joshi, S. 2003 . A geometric approach to shape clustering and learning . 2003 IEEE Workshops on Statistical Signal Processing 302 305.
167 Srivastava, A., S. Joshi, W. Mio , X. Liu. 2005 . Statistical shape analysis: clustering, learning, and testing. Pattern Analysis and Machine Intelligence 27(4), 590 602 . Stefani, G., F. Slack. 2008. Small non coding RN As in animal development. Nat Rev Mol Cell Biol 9, 219 230. Thomas, N. 2009. High content screening: a decade of evolution. Journal of Biomolecular Screening 15(1), 1 9. Tseng, Y., T. Kole, D. Wirtz. 2002. Micromechanical mapping of live cells by multiple particle tracking microrheology. Biophys J 83, 3162 3176. Uehata, M. et al. 1997. Calcium sensitization of smooth muscle mediated by a Rho associated protein kinase in hypertension. Nature 389, 990 994. Vogelstein, B., et al. 2013. Cancer genome landscapes . Science 339, 1546 1558. Volinia, S. et al. 2006. A microRNA expression signature of human solid tumors defines cancer gene targets. Proc Natl Acad Sci U S A 103, 2257 2261. Walsh, L., S. Damjanovski. 2011. IGF 1 increases invasive potential of MCF 7 brea st cancer cells and induces activation of latent TGF mesenchymal transition. Cell Commun Signal 9, 10. Wang, J., et al. 2008. Cellular phenotype recognition for high content RNA interference genome wide screening. Journal of Biomolecular Screening 13(1), 29 39. Watanabe, T., K. Sato, K. Kaibuchi. 2009. Cadherin mediated intercellular adhesion and signaling cascades involving small GTPases. Cold Spring Harb Perspect Biol 1, a003020. Waters, J. 2009. Accuracy and precision in qu antitative fluorescence microscopy. Journal of Cell Biology 185(7), 1135 1148. Wienholds, E., et al. 2005. MicroRNA expression in zebrafish embryonic development. Science 309, 310 311. Williams, A., et al. 2009. MicroRNA 206 delays ALS progression and prom otes regeneration of neuromuscular synapses in mice. Science 326, 1549 1554. Wolf, K. et al. 2003. Compensation mechanism in tumor cell migration: mesenchymal amoeboid transition after blocking of pericellular proteolysis. J Cell Biol 160, 267 277. Woods, R. 2003 . Characterizing volume and surface deformations in an atlas framework: theory, applications, and implementation. NeuroImage 18, 769 788.
168 Wu, H., V. Goel, F. Haluska. 2003. PTEN signaling pathways in melanoma. Oncogene 22, 3113 3122. Wu, P., S. Arce , P. Burney, Y. Tseng. 2009. A novel approach to high accuracy of video based microrheology. Biophys J 96, 5103 5111. Wu, P., N. Nelson, Y. Tseng. 2010. A general method for improving spatial resolution by optimization of electron multiplication in CCD ima ging . Optics Express 18, 5199 5212. Yang, J., et al. 2004. Twist, a master regulator of morphogenesis, plays an essential role in tumor metastasis. Cell 117, 927 939. Yang, J., R. Weinberg. 2008. Epithelial mesenchymal transition: at the crossroads of deve lopment and tumor metastasis. Dev Cell 14, 818 829. Yao, M., C. Sharp, L. DeBrunner, V. DeBrunner. 1997 . Image processing for a line scan camera system. IEEE Signals, Systems and Computers 1, 130 134. Yuan, Y., et al. 2012. Quantitative image analysis of c ellular heterogeneity in breast tumors complements genomic profiling. Sci Transl Med 4(157), 1 10. Zhang, B., et al. 2007. Gaussian approximations of fluorescence microscope point spread function models. Applied Optics 46, 1819 1829. Zhang, D., G. Lu. 2004 . Review of shape representation and description techniques. Pat. Recog. 37, 1 19. Zhang, S., D. Yu. 2010. PI(3)king apart PTEN's role in cancer. Clin Cancer Res 16, 4325 4330.
169 BIOGRAPHICAL SKETCH Stephen received a bachelor degree in food science and human nutrition from the University of Florida in 2005 through the College of Agricultural and Life Sciences . He had also completed his EMT certification at Sant a Fe Community College in 2004. After graduation, he worked at an i nternship at the Jacksonville Cardiovascular Center, Florida for 6 months before deciding to transition from a medical track into Chemical Engineering. Stephen attended Sant a Fe Community College to complete the mathematics requirements necessary for the C hemE program at UF , and upon successful completion of the prerequisites, he entered into the m c hemical e ngineering at the University of Florida in 2007. Upon completion of the m esearch lab as a volunteer to complete some of his m Hsun Wu. That effort was rewarded with a second author publication in 2009, but Stephen was also successful in writing a small research g rant and had been awarded the prestigious McKnight Doctoral Fellowship that allowed him to return to the UF ChemE family as a PhD student in 2010. In 2011, he was elected as the P resident of the Graduate Association of Chemical Engineers , and remained acti ve in the student community as a leader . Stephen successfully completed his qualifying exam in 2012 on the topic of quantification of cell shape , was awarded the SEAGEP grant in 2013 as well as a supplemental retention grant from the graduate school, and s uccessfully defended his dissertation in March of 2014. Throughout his education, Stephen played recreational soccer, learned to boulder, PhD students to learn image analysis and programming , learned some basic Chinese and started an occasionally
170 successful garden in his front yard. In 2010, he met the love of his life, Megan, proposed to her in 2012 , and married her in 2013 . They enjoy traveling, family and walks at the end of the day to hel p keep the important things of life in mind.