UFDC Home  myUFDC Home  Help 



Full Text  
COMPLEX NETWORK ASSORTMENT AND MODELING By ASHWIN ARULSELVAN A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORIDA 2006 Copyright 2006 by Ashwin Arulselvan ACKNOWLEDGMENTS I would like to express my gratitude to my advisor Dr. Panos M. Pardalos for all the valuable guidance and immense support he gave me while doing this thesis. I am really thankful to him. I would also like to thank Dr. J. Cole Smith, member of my committee, for his remarks, criticisms and advice for improving the quality of the thesis presented in every possible way. I also thank my family for their moral support. TABLE OF CONTENTS A C K N O W L E D G M E N T S ......... .................................................................................... iii L IST O F T A B L E S ..................................................................... ..................... L IST O F FIG U R E S .... .............................. ....................... .......... ............... vi ABSTRACT .............. ..................... .......... .............. vii CHAPTER 1 IN T R O D U C T IO N ............................................................................. .............. ... 2 IDENTIFYING CONNECTED COMPONENTS IN THE MARKET GRAPH.........2 In tro d u ctio n .................................................................................. 2 Prelim inaries of G raph Theory .................... ..... ................................ 6 Motivation and Techniques for Finding Connected Component in the Market G raph ................ ..... ... .................................... ..................7 Structure of Connected Components in the Market Graph ...................................... 10 Size of Connected Components in the Market Graph in the Context of PowerLaw M o d el .............. .......... ........................ ......... ............... ................ 1 1 Structure of Connected Components in the Market Graph ....................................... 13 C including R em arks .......................................... .. .. .... ....... .. .. .... 14 3 EVOLUTION OF SOCIAL NETWORK........................................ ............... 20 In tro du ctio n ...................................... ................................................ 2 0 M o d el ...............................................................................................2 1 Perform ance A nalyses of the M odel.................................... ..................................... 23 R e su lts ...................................... .......................................................2 4 C o n clu sio n s..................................................... ................ 2 4 4 CON CLU SION S .................................. .. .......... .. .............28 L IST O F R E FE R E N C E S ....................................................................... ... ................... 29 B IO G R A PH IC A L SK E TCH ..................................................................... ..................32 LIST OF TABLES Table page 21 Arrangement of stocks into groups for the market graph with threshold of 0.5 ......15 22 Dates and mean correlations corresponding to each 500day shift..........................16 23 Stocks contained in largest size group for eleven time periods (1 being the oldest period, and 11 being the most recent). ........................................ ............... 17 31 Clustering coefficient, assortative mixing coefficient and Average length of the giant com ponent ................................ .................... ...................... 25 LIST OF FIGURES Figure page 21 Largest group size by time period (A corresponds to the threshold value of 0.7, B corresponds to the threshold of 0.6, and finally, C pertains to the market graph w ith th resh old 0 .5)................................................ ................ 19 31 Degree distribution for n = 500 nodes (Rsquare = 0.9056) .............. ...............25 32 Degree distribution for n = 800 nodes (Rsquare = 0.9563) ...............................25 33 Degree distribution for n = 1000 nodes (Rsquare = 0.9535) .............................26 34 Degree distribution for n = 1200 nodes (Rsquare = 0.9511 ) .............................26 35 Degree distribution for n = 1500 nodes (Rsquare = 0.9549) .............................26 36 Degree distribution for n = 1700 nodes (Rsquare = 0.9535) .............................27 37 Degree distribution for n = 2000 nodes (Rsquare = 0.9541) .............................27 Abstract of Thesis Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Science COMPLEX NETWORK ASSORTMENT AND MODELING By Ashwin Arulselvan August 2006 Chair: Panos M. Pardalos Major Department: Industrial and Systems Engineering Most of the real world networks were observed to follow the power law model and possess highly clustered subgraphs and small diameters. Finance and social networks are no exceptions for these observations. In this thesis, we consider a recently introduced networkbased representation of the U.S. stock market, which follows the powerlaw model. We propose a computationally efficient technique for identifying clusters of similar stocks in the market by partitioning the market graph into a set of connected components. It turns out that these groups have specific structure, in which each cluster corresponds to certain industrial segments. Moreover, the size of these connected components is consistent with the theoretical properties of the powerlaw model. We then present a model that simulates the growth of a social network over time, by considering weights for relationship strength and identities, features attributed to individuals represented as nodes in the network that helps in their hierarchical classification. Other factors that influence the evolution includes mutual acquaintances between a pair of nodes considered and the time of last acquaintance between them. Our simulation resulted in a model having many interesting features that are desired in real world network, including high clustering and assortative mixing coefficient, scale free distribution and small world phenomenon. CHAPTER 1 INTRODUCTION Complex networks attracted a lot of attention in the recent years as many real world networks share the same daedal features and the study of these features make them mathematically interesting and help us better understand them. Complex networks differ considerably from the random graph model presented by Erd6s and Renyi [1]. For instance, random graphs follow Poisson degree distribution, while complex networks follow power law distribution [23]. Power law distribution is scale free and for this reason complex networks are also referred to as scale free networks. These networks are highly clustered compared to random graphs [4]. Also, they were observed to exhibit small world phenomenon [46]. We take two such networks for our study that we presenting in this thesis. In chapter 2, with a brief introduction about finance graphs we present an efficient technique to assort assets into highly correlated groups. In chapter 3, we suggest a statistical model that simulates the growth of a social network and summarize the precision of the model. CHAPTER 2 IDENTIFYING CONNECTED COMPONENTS IN THE MARKET GRAPH We consider a recently introduced networkbased representation of the U.S. stock market referred to as the market graph, which has been shown to follow the powerlaw model. We propose a computationally efficient technique for identifying clusters of similar stocks in the market by partitioning the market graph into a set of connected components. It turns out that these groups have specific structure, where each cluster corresponds to certain industrial segments. Moreover, the size of these connected components is consistent with the theoretical properties of the powerlaw model. Introduction Taking into consideration a huge amount of data generated by the stock market on a daily basis, the importance of discovering efficient ways to represent and analyze these data becomes apparent. The stock market data is generally illustrated by different plots displaying the price of a certain stock during various time periods. Nevertheless, as the number of stocks increases, the task of analyzing the information contained in the plot becomes more and more complicated. In our study we adopted an alternative approach to explore the stock market data. Specifically, we applied a recently developed technique of representing the stock market prices over time in the form of a network with the stocks as the nodes and the edges induced by the relations between the prices of two different stocks. This network is called the market graph [79]. It is worthwhile to mention that the above approach to representing massive datasets is widely used in many different areas such as social sciences, finance, genomics, and protein folding [1015]. This methodology can be applied to interpret large datasets arising in various applications as a graph, where the elements of the dataset are the vertices, and the relationships between those elements are represented by the edges of the graph. In many cases, such network representations prove to be extremely useful and convenient for the information analysis and elucidation of the hidden dependencies in the data. A network representation of the stock market data is derived from the cross correlation of price fluctuations over a certain time period. We construct the market graph as follows: each node in the graph corresponds to a particular stock, and two nodes are connected by an edge if the price correlation coefficient for the pair of associated stocks (computed over a specific period of time) exceeds a given threshold. Let us now describe the procedure for constructing the market graph. Denote the price of the financial instrument i on day t by P,(t). The logarithm of return on the asset i over the oneday period from t1 to t is given in equation (2.1) R, (t) = In(P (t) /P (t 1)) (2.1) Then the correlation coefficient between instruments i and j can be computed as shown in equation 2.2. C (2.2) In equation period [16] and is given in equation 2.3. 1 N N t)1 (2.3) Fix a threshold [11]. For each pair of stocks with C, > 0, we add an edge between nodes i and j of the graph. This indicates that the two stocks display a similar behavior over time. In particular, the degree of similarity is determined by the prescribed value of the threshold. From the above it follows that the analysis of the patterns exhibited by the market graph can provide some useful insights into the inner structure of the stock market. Interestingly, the previous study indicated that the degree distribution of the market graph can be described by the powerlaw model [11]. We say that a vertex has degree k if there are k edges incident to it. In accordance with the powerlaw model, the probability of a vertex having a degree k is given in equation 2.4. P(k)oc k, (2.4) or, equivalently, log P(k)oc logk. (2.5) Equation 2.5 implies that the degree distribution plotted in the logarithmic scale reproduces a straight line. Furthermore, the degree distribution of the graph is a key characteristic that describes a reallife dataset corresponding to this graph. It reveals the largescale pattern of connections in the graph, which displays the global properties of the dataset this graph represents. Remarkably, aside from the stock market data, the powerlaw model can be observed in many other practical areas including, but not limited to biological networks, computer networks, and social networks [1721]. This interesting discovery led to an introduction of the concept of the socalled "selforganized networks." Moreover, it turned out that this phenomenon can also be found in finance. In the previous studies the authors came up with a novel idea to relate certain correlationinduced characteristics of the stock market prices with some combinatorial properties of the correspondent market graph [79]. Specifically, the problem of the stock arrangement into groups of highly correlated assets was considered. This problem was solved by utilizing simple algorithms for finding cliques / independent sets in the market graphs resulted from different threshold values. Our present study takes a different approach to analyzing the structure of the market graph. Specifically, in this paper we utilized the method for arranging the closely related stock into certain groups based on the maximum weighted path cover of the market graph. Although, in general the maximum weighted path cover problem is known to be NPhard [22]. However, we can find the connected components of the market graph in polynomial time. Interestingly, this study has shown that there is a certain degree of similarity in the market graph configuration obtained by our present method with the one obtained by the application of the cliques problem. The paper also gives interesting insights into relationships between different industries derived from the market graph structure. Taking into consideration a huge amount of data generated by the stock market on a daily basis, the importance of discovering efficient ways to represent and analyze these data becomes apparent. The stock market data is generally illustrated by different plots displaying the price of a certain stock during various time periods. Nevertheless, as the number of stocks increases, the task of analyzing the information contained in the plot becomes more and more complicated. Preliminaries of Graph Theory Let us first introduce some basic definitions and notations from the graph theory, which are used in the paper. Later we will give the appropriate interpretation of the introduced concepts in application to the data mining. Let G = (V E) denote an undirected random graph with the set of nodes V, I V = n, and the set of undirected arcs E = {e = (i, j): i, j V) We say that the graph G is connected if one can find an undirected path between every two nodes of V. A graph that does not satisfy the aforementioned property is called disconnected. Every disconnected graph can be decomposed into a number of connected subgraphs. Such subgraphs are called the connected components of the graph. For any subset of nodes in the graph, let G(VI)denote the subgraph of G induced by V1. A subset of nodes C is said to be a clique if the induced subgraph G(C) is complete, i.e., G(C)contains all possible arcs. The problem of finding the largest clique in the graph is known as the maximum clique problem. It was proven that the maximum clique problem is NPhard [22]. Also many cases of this problem are difficult to approximate. Arora and Safra [23] shown that for some e > 0 the approximation of the maximum clique problem within a factor of n" is NPhard. A path in a graph represents an alternating sequence of vertices and edges such that from each of its vertices there is an arc to the descendant vertex. It is also assumed that a path has no cycles. A weighted graph associates a value (weight or cost) with every edge in the graph. Clearly, the market graph can be viewed as a weighted graph with the cross correlation between two stocks being the arc weights. The sum of the weights of the traversed edges in a weighted graph is called the weight of a path. A path cover of the graph G is a set of vertexdisjoint paths that together cover the vertices of G. In a weighted graph, a path cover of the maximum weight is referred to as the maximum weight path cover. The Maximum weight path covering problem (MWCP) can be formulated as a problem of finding maximum weight path cover of a given graph. This problem is shown to be in NPhard [24]. Motivation and Techniques for Finding Connected Component in the Market Graph One can easily see from the construction of the market graph that two particular stocks are closely related if their correspondent nodes in the graph are connected by an edge. Consequently, we can deduce that there must be a certain degree of association between two specific stocks if their nodes in the market graph are connected by a path. Moreover, all the stocks represented by the nodes in the path can be combined together in a distinct group of interdependent stocks. In this view, it seems very natural to consider the MWCP for the market graph. In fact, finding the collection of vertexdisjoint paths that has the maximum possible weight can be perceived as the "best" arrangement of closely dependent stocks into the separate groups. In order to solve the MWCP, we applied a greedy algorithm proposed by Liao et al. [24]. The method is analogous to Kruskal's maximum spanning tree algorithm [25]. Precisely, Liao's algorithm iteratively selects the edge with the greatest weight to be added to the path, while preserving the properties of the path. In the case when two or more arcs are eligible to enter (i.e., they all have the same weight), the algorithm non deterministically selects only one of them. The procedure terminates when either no additional edges can be added to the solution without violating the path properties, or there are no more edges. The solution of the Maximum weighted path covering problem for a constructed market graph showed no significant reduction in the number of the stock groups compared with the maximum clique method we used in the previous studies. Moreover, as mentioned before, the MWCP is an NPhard problem, and so it is computationally unattractive. Thus, the MWCP approach to grouping the related stocks, though interesting, is not particularly better than the previous approach based on the cliques/maximum sets. Notice that the solution of the MWCP does not include all the edges of the market graph. For the market graphs with a high value of correlation threshold, the above approach results in a rather large number of the groups with fewer stocks in each group. On the other hand, for the high threshold market graphs one might want to include all the edges of the market graph, since each and every one of them corresponds to a significant level of correlation between stocks. Essentially, this modified problem can be formulated as the problem of finding all connected components of the market graph. Clearly, this approach gives a very natural arrangement of all stocks of the market graph into separate groups of closely related stocks. Each connected component indicates a certain group of associated stocks. Although, the application of the method based on finding all the connected components would generally result in stocks in a group having a somewhat weaker connections compared to the MWCP grouping of the same market graph, this can be easily overcome by setting a higher correlation threshold in the graph. It is a widely known fact that all the connected components in a graph can be obtained simply by using either the depthfirst search or the breadthfirst search. Notice that both procedures can be performed in polynomial time. Here we applied the depth first search algorithm (DFS) to find all connected components of the market graph. The DFS algorithm can be briefly described as follows. First some vertex v of the graph is randomly selected and added to a stack. Then for each node vi of the descendants of v, which has not been selected previously, we apply a recursive procedure by adding vi to the stack and examining all its descendants v2 in a similar fashion. Precisely, if v2 has not been selected yet we add it to the stack, and then examine all the descendants of v2 by choosing a particular descendant and proceeding recursively. The node is taken out of the stack after all its adjacent vertices are visited. After all the nodes, which can be reached from the initial node v by some path, has been examined, we choose the next v from those vertices of the graph that has not been visited yet. The algorithm terminates when all the vertices in the graph are visited. The output of the DFS algorithm is a set of depthfirst search trees, where each tree represents a connected component in the undirected graph. The number of connected components is equal to the number of depthfirst search trees. Moreover, the outcome of the scheme does not depend on the choice of initial vertex. The main advantage of this approach is that the DFS algorithm runs in polynomial time. In particular, the procedure takes O(n+lE) with E being the number of edges in the graph) if the input is represented by the adjacency list, while it takes O(n2) if the input is given in the form of adjacency matrix. Taking into consideration the size of the input for the market graph problem, an adjacency matrix representation is preferred. Structure of Connected Components in the Market Graph First, we applied the depthfirst search algorithm on the market graph with a correlation threshold of 0.7. The obtained stock arrangement had a large number of clusters with very few stocks in each group. Clearly, the stocks in each obtained group were strongly linked. Decreasing the number of the groups formed would allow one to see a less pronounced pattern of connections between the stocks. This is achieved by decreasing the value of the correlation threshold of the market graph. Subsequently, the resulted market graph also takes into account somewhat weaker connections. Next, we applied the DFS algorithm on the market graph with the threshold value set at 0.5. As expected, the algorithm produced a stock arrangement with a smaller number of groups and, in general, large number of stocks in each group. This is because of the trivial but important observation that all the connected components of a higher threshold stay connected at the lower threshold and also may get connected to some other connected components (Table 21). For the market graph with threshold of 0.5, the largest group in the stock arrangement has a total of 269 stocks, which represents both technology and finance industries. An important observation is that each connected component in the market graph corresponds to a distinct industry sector. It should be noted that this approach provides a natural way for clustering stocks. Clustering is a wellknown challenging problem arising in data mining [26]. It deals with partitioning a dataset into sets (clusters) of elements grouped according to some similarity criterion. The main difficulty one encounters in solving the clustering problem on a certain dataset is the fact that the number of desired clusters of similar objects is usually not known a priori, moreover, an appropriate similarity criterion should be chosen before partitioning a dataset into clusters. Using the technique of representing the stock market, the clustering problem is treated as graph partitioning, where the subgraphs in the partition correspond to different clusters. The above results suggest that partitioning the market graph into distinct connected components is a reasonable approach in the framework of clustering stocks. Size of Connected Components in the Market Graph in the Context of PowerLaw Model As mentioned above, the market graph follows the powerlaw model. The asymptotic properties of powerlaw random graphs, including the size of their connected components, have been studied theoretically. It is important to mention the existence of a giant connected component (the unique largest component in the graph when the average degree is greater than 1) in a powerlaw graph with y < yo z 3.457875, and the fact that a giant connected component does not exist otherwise. The emergence of a giant connected component at the point yo0 3.457875 is referred to as the phase transition. As it was found in [8], the values of y for the considered instances of the market graph were smaller than the aforementioned threshold value. Therefore, one would expect to find a large connected component in the market graph. The results presented in this section confirm this hypothesis. Notice that the arrangement of stocks into groups for the market graph with the threshold of 0.5 (presented in Table 21) clearly shows the presence of a giant component (represented by the financial services/technology group). Observe also that the size of the giant component is significantly larger than the sizes of all the other groups. One may pose the following logical question: How does the size of the largest component in the market graph changes over a certain period of time? To answer this question, we constructed different market graphs with a given value of threshold for 11 adjacent time periods. Specifically, in order to examine the dynamics of the market graph structure, we selected the time period of 1000 trading days in 19982002 and considered eleven 500day shifts within this period. The starting dates of any two consecutive shifts are separated by a 50days interval. In other words, each pair of successive shifts had 50 different days and the rest 450 days in common. The time shifts considered in this paper are the same as the ones considered in the previous studies [9]. This method lets us capture the structural changes of the market graph using comparatively small intervals between shifts, and at the same time allows us to maintain sufficiently large sample sizes of the stock prices data in order to be able to compute the crosscorrelations for each time period. Also note that in our analysis we took into consideration only the stocks, which were among those traded during the given 1000 trading days (i.e., for practical reasons we did not take into account stocks that had been withdrawn from the market). We considered three different values of the correlation threshold, precisely 0.7, 0.6, and 0.5. For each given threshold value, the correspondent market graphs were constructed for all eleven time periods. For each of the given eleven periods we ran the DFS algorithm to find all connected components in the associated market graph. The size of the largest group (i.e., giant connected component), formed in each individual period, was computed. Figure 21 shows the largest group sizes obtained in all 11 time periods for each particular value of the threshold. It can be seen that for all three different thresholds the size of the largest group of related stocks follows an overall increasing trend. Precisely, as a characteristic common for all three cases, the giant component size predominantly increases from the oldest time period (period 1) to the most recent one (period 11). Such clearly visible overall increasing dynamics exhibited by the largest group size can be well explained in the view of the globalization tendency in the market. Note that this fact was also mentioned in [9] in the context of the growth of the edge density and the maximum clique size in the market graph. Structure of Connected Components in the Market Graph Another issue related to the size dynamics that deserves a special consideration is how the structure of a giant connected component in a market graph transforms throughout various time periods. To investigate this question, we set the threshold value at 0.7 and constructed the correspondent market graphs for all eleven time periods above. Using the DFS algorithm, we found a giant component along with the other connected components in each of the obtained market graphs. The giant connected components for all eleven time shifts are given in table 23. It appears that in most cases stocks that belong to a giant connected component during an earlier period are also included in the giant component in later periods. There are some other interesting observations about the stock structure of the largest size group found for different time periods. Interestingly, all the giant connected components contain a large number of stocks of the companies representing the "hightech" industry sector. Furthermore, each giant component includes stocks of the companies related to the semiconductor industry, and the number of these stocks in the largest group increases with time. All these facts imply that the corresponding branches of industry had expanded during the considered period of time to form a major cluster in the market. Additionally, we detected that in the later periods (particularly, in the last 2 of the 11 periods) the giant connected components in the market graphs contain quite a significant number of exchange traded funds (stocks reflecting the behavior of certain indices representing various groups of companies). It should be mentioned that all giant connected components include Nasdaq 100 tracking stock (QQQ), which was also found to be the vertex with the highest degree (i.e., correlated with the most stocks) in the market graph [78]. Concluding Remarks We extended the methodology of representing the stock market as a graph. We have shown that partitioning the market graph into a set of connected components provides reasonable results in the context of data mining, in particular, clustering stocks into groups with similar behavior. Moreover, we observed similar patterns of the sizes of connected components in most instances of the market graph, with one large connected component and several small ones. Since the market graph follows a power law with a small parameter Y, this observation is consistent with theoretical results obtained for the powerlaw random graph model, indicating the existence of a giant connected component in such graphs. Our study confirmed that the recently introduced networkbased approach is promising for studying stock market dynamics. We believe that this methodology can be further developed and generalized to take into account various factors affecting the market and assist researchers and practitioners in making strategic decisions. Table 21. Arrangement of stocks into groups for the market graph with threshold of 0.5 Industry Stocks Basic Materials copper AA, AL, N, PD and aluminum Financial services/technology AAPL, CSCO, ALTR, ADI, AMAT, AMCC, ANAD, ASML, ATML, CY, IDTI, INTC, CMGI, AMTD, AOL, AMZN, DCLK, ET, ELNK, INKT, CNET, NITE, NTBK, RNWK, NTAP, CHKP, QQQ, ADBE, MDY, ADCT, AFCI, BRCM, JDSU, BVSN, ELX, QLGC, KLAC, CMOS, KLIC, ASYT, HELX, LRCX, CYMI, LLTC, DIA, AIG, AXP, BAC, BBT, ASO, CMA, BK, BTO, ABK, HIG, CB, SPC, JP, LNC, TMK, MBI, AF, CF, GPT, FVB, CBSS, FBF, C, BSC, AGE, JPM, COF, HI, GSB, GDW, KEY, FITB, FRE, FNM, MEL, HBAN, NCC, PNC, NTRS, RF, CFR, CYN, HU, MI, MRBK, STI, NFB, ONE,WB, SPY, ADX, MIM, AMO, ATF, SBC, BLS, T, VZ, DJM, EWF, EWQ, ALA, STM, ERICY, EWD, MSCI,EWG, DT, COLT, CWP, FTE, KPN, TEF, EWI, BBV, EWP, EWN, ABN, AEG, AXA, STD,EWU, PHG, NOK, ITWO, SEBL,MXIM, LSCC,LSI LSI, NSM, PMCS, MERQ, VRSN, SUNW, CMVT, NT, BCE, XLNX, MCHP, VTSS, RFMD, TQNT, SWKS, TXCC, TXN, MOT, MU, TER, RTS, MCRL, DELL, MSFT, YHOO, IBM, ORCL, VODTI, PT, LEH, LM, RJF, MER, MWD, EWH, APB, APF, RR, GCH, TCH, JFC, TDF, CHL, EWS, MLF, GE, SCH, STT, UPC, SNV, SOTR, TCB, WL, WM, WFC, USB, MXF, BZF, BZL, LAQ, FMX, EKT, EWW,LDF, MSF, UBB, ELP, TBH, TAR, BFR, IRS, TEO, TDP, TMX, TFONY,TV, KOF,MXE, TZA, TY, USA, AMGN, MEDI, CHIR, GENZ, HQH, ALKS, CEGE, HGSI, ABGX, PDLI, INCY, MEDX, MLNM, AFFX, HQL, LYNX, MYGN, GLGC, VRTX, ASG, CCU, KSU, NOVL, PAYX, TLAB, WABC, VLY, KRB, CCR, PVN, MMC,CIEN, DISH, SANM, BGEN, CREE, CTXS, FLEX, HLIT, IBIS, INTU, ISSX, NXTL, QCOM, SFE, SMTC, SPOT, SWS, SIEB, JBOH, MHMY, TFSM, BRKS, COHU Gold ore Industries ABX, AEM, ASA, AU, DROOY, HGMCY, GFI, NEM, KGC, PDG, ECO, GLG, TVX Healthcare ACVA, ACV Financial, ADVNA, ADVNB Credit/Personal credit institutions Utilities/services sectors AEE, AEP, AYE, CEG, CIN, D, DUK, ED, DTE, DQE, IDA, AVA, VKL, LNT, NI, PEG, ETR, FPL, PNW, EXC, PSD, OGE, FE, PGN, PPL, REI PCG, SO, TE, HE, WEC, WPS, TXU, XEL, ILA, DPL Basic materials/energy AHC, APA, APC, BJS, ATW, DO, BHI, CAM, ESV, GLBL, GSF, sectors HAL, HP, NBL, BR, UCL, MUR, PEO, DVN, NE, NBR, NOL PDE, GW, PKD, PDS, PTEN, MVK, RDC, PGO, RIG, SII SLB, TDW, TMAR, VRC, OII, WFT, VTS, OEI PPP, EOG, KMG, COP, CVX, SC, BP, RD, TOT, XOM Investment banking AKOA, AKOB Transportation sectors ALK, CAL, AMR, DAL, LUV, UAL, NWAC Healthcare/ AZN, GSK pharmaceutical preparations Table 21. Continued Industry Stocks Basic BCC, BOW, GP, IP, PCH, RYN, TIN, WY materials/consumer goods sector Financial/banking BCM, BMO, RY, TD sectors Consumer Goods, Tires BDGA, BDG and inner tubes Computers and banking DOCC, FLBK, IBCA, PBIX, SCAI, SOV sectors Pharmaceutical BMY, SGP, JNJ, MRK, PFE preparations Consumer EWJ, HIT, SNE, MTF, NTT, TM goods/financial Indian Financial services IFN, IGF, IIF, JFI Korea technology/ KEF, KF, SKM finance Media/technology CYLK, HOLL, NAVR, SHRP Plastics materials and DD, DOW, PPG, ROH resins Table 22. Dates and mean correlations corresponding to each 500day shift 1 9/24/1998 9/15/2000 2 12/4/1998 11/27/2000 3 2/18/1999 2/8/2001 4 4/30/1999 4/23/2001 5 7/13/1999 7/3/2001 6 9/22/1999 9/19/2001 7 12/2/1999 11/29/2001 8 2/14/2000 2/12/2002 9 4/26/2000 04/25/2002 10 7/7/2000 7/8/2002 11 9/18/2000 9/17/2002 Period # Starting date Ending date Table 23. Stocks contained in largest size group for eleven time periods (1 being the oldest period, and 11 being the most recent). Time Stocks Size 11 ABGX, BBH, AMGN, CHIR, CRA, MLNM, HGSI, MEDX, PDLI, 202 IJH, AGE, DIA, C, FBF, IYF, BAC, IYG, GE, IVE, EWG, ABN, EZU, AXA, AEG, ING, EWQ, BBV, EWP, TEF, DT, FTE, STD, EWD, EWI, SPY, BDH, ADI, ALTR, AMAT, AMCC, BRCM, IAH, ATML, CY, FCS, IYW, AMD, NVLS, ASML, IFX, PHG, ALA, STM, EPC, INTC, DELL, IVW, BHH, ARBA, IYV, BEAS, IIH, CHKP, MERQ, QQQ, ARMHY, BRCD, ELX, QLGC, CIEN, JDSU, XLK, CLS, FLEX, IWF, CSCO, JNPR, MXIM, IDTI, LLTC, IRF, KLAC, BRKS, CMOS, SMH, CYMI, LRCX, KLIC, LSCC, MCHP, TXN, IWO, HHH, AOL, EBAY, IJR, IVV, IWB, IWD, IWM, IWN, IWV, IWW, IYC, IWZ, IYJ, IYY, IYZ, ATF, SBC, TTH, VZ, MKH, MDY, MWD, BSC, GS, LEH, LM, MER, XLF, AIG, BK, JPM, RKH, BBT, UPC, FVB, MI, NCC, STI, RF, CMA, MEL, ONE, USB, WB, STT, WFC, XLV, VIAB, VIA, XLI, XLY, MSFT, NOK, SBF, YHOO, SSTI, XLNX, LSI, PMCS, VTSS, TER, LTXX, VSEA, MCRL, MU, NEWP, NSM, NTAP, SEBL, EXTR, FDRY, VRTS, SMTC, SUNW, EMC, IBM, ORCL, JBL, SANM, CREE, NVDA, CNXT, ITWO, KEI, KOPN, MRVC, QCOM, RFMD, TQNT, SCMR, SNDK, VRSN, VSH, KEM, BVSN, CMRC, IWOV, MOT, NT, DCX, VRTX, DNA, GILD, IDPH, MEDIA, MYGN 10 ABGX, BBH, AMGN, CRA, MLNM, HGSI, MEDX, PDLI, IJH, BDH, 159 ALA, PHG, ASML, AMAT, ALTR, ADI, CY, ATML, IAH, AMCC, BHH, ARBA, IYV, BEAS, IIH, BVSN, IYW, AMD, NVLS, CMOS, SMH, BRCM, IWF, CSCO, EMC, BRCD, ELX, QLGC, JNPR, CIEN, EXTR, FDRY, QQQ, AVNX, CHKP, SEBL, MERQ, VRTS, NTAP, XLK, CLS, FLEX, SANM, JBL, CREE, DELL, INTC, IVW, C, DIA, GE, IVE, IJR, IVV, HHH, AOL, EBAY, IWB, IWD, IWM, IWV, IYC, IYY, IYF, BAC, IYG, JPM, SPY, MDY, MWD, GS, LEH, BSC, LM, MER, XLF, FBF, RKH, PNC, SSTI, STM, EPC, IFX, KLAC, BRKS, LLTC, IDTI, LSCC, LRCX, LTXX, MXIM, IRF, LSI, XLNX, MCHP, PMCS, VTSS, TXN, SMTC, TER, NOK, XLI, XLV, DIS, VIAB, VIA, XLY,YHOO, MSFT, NEWP, SUNW, JDSU, NVDA, ORCL, CMRC, GLW, ITWO, MRVC, QCOM, RFMD, TQNT, SWKS, SCMR, SNDK, VRSN, MU, NSM, KLIC, IWOV, IBM, EWG, BBV, EWP, EWQ, EWI, STD, TEF, DT, FTE, EWD, MOT, NT, VRTX, DNA, IDPH, MEDI 9 ADI, ALTR, AMAT, AMCC, BDH, ATML, CY, IAH, BEAS, IIH, ARBA, 110 BHH, CMRC, QQQ, ARMHY, ASML, IFX, PHG, ALA, STM, MDY, DIA, C, JPM, XLF, BAC, FBF, MWD, GS, LEH, BSC, MER, SPY, GE, HHH, AOL, EBAY, XLK, BRCM, PMCS, CSCO, JDSU, CIEN, JNPR, GLW, SUNW, EMC, BRCD, ELX, QLGC, NTAP, SEBL, CHKP, MERQ, VRTS, LLTC, KLAC, IRF, MXIM, LSCC, IDTI, LRCX, NVLS, CMOS, INTC, DELL, TXN, XLNX, LSI, MCHP, VTSS, TXCC, TER, CREE, EXTR, FLEX, CLS, SANM, MSFT, NEWP, NVDA, ORCL, SSTI, TQNT, RFMD, SWKS, YHOO, XLI, XLV, VIAB, VIA, PNC, NCC, XLY, NOK, AVNX, BVSN, DIGL, ITWO, MRVC, SCMR, VRSN, IWOV, IBM, NSM, NT, MU Table 23. Continued Time Stocks Size 8 ADI, QQQ, ALTR, AMAT, AMCC, BRCM, PMCS, CSCO, EMC, SUNW, 92 XLK, ASML, PHG, STM, ALA, ATML, CY, LSI, XLNX, INTC, DELL, NVLS, CMOS, KLAC, LLTC, LSCC, IDTI, LRCX, TXN, MXIM, VTSS, JDSU, CIEN, JNPR, BEAS, CHKP, MERQ, SEBL, ITWO, MDY, DIA, C, JPM, XLF, BAC, BK, FBF, PNC, NCC, SPY, GE, HHH, AOL, EBAY, VRSN, YHOO, XLI, XLV, VIAB, VIA, NTAP, BRCD, ELX, QLGC, VRTS, ORCL, GLW, TXCC, TER, KLIC, MCHP, NSM, TQNT, RFMD, SWKS, NOK, CREE, EXTR, FLEX, CLS, SANM, MSFT, AVNX, BVSN, CMRC, ARBA, CNXT, DIGL, IRF, MRVC, NEWP, SCMR 7 ABGX, MEDX, BBH, AMGN, DNA, HGSI, MLNM, PDLI, IDPH, MDY, 95 ATML, AMAT, ALTR, LSCC, IDTI, QQQ, AMCC, BRCM, PMCS, CSCO, EMC, SUNW, XLK, ASML, PHG, STM, ALA, NOK, TXN, ADI, CY, LSI, XLNX, KLAC, LRCX, NVLS, KLIC, MXIM, LLTC, VTSS, TXCC, TER, MCHP, BEAS, JNPR, CIEN, MERQ, SEBL, CHKP, NTAP, QLGC, BRCD, ELX, VRTS, ORCL, CREE, DELL, FLEX, CLS, SANM, HHH, AMZN, AOL, EBAY, SPY, C, DIA, XLF, BAC, BK, FBF, JPM, PNC, NCC, XLI, XLV, VIAB, VIA, GE, MIM, YHOO, INTC, JDSU, GLW, TQNT, RFMD, SWKS, VRSN, BVSN, CMRC, ARBA, EXTR, ITWO, SCMR, MEDI 6 ALA, STM, ASML, PHG, QQQ, ALTR, LSCC, ATML, AMAT, AMCC, 71 BRCM, JDSU, CSCO, SUNW, EMC, XLK, CIEN, CREE, FLEX, GLW, INTC, JNPR, KLAC, LRCX, NVLS, KLIC, TER, TXN, XLNX, LLTC, MXIM, VTSS, PMCS, TXCC, MDY, DIA, C, JPM, XLF, BAC, BK, FBF, PNC, SPY, MIM, XLI, XLV, VIAB, VIA, NTAP, SEBL, VRTS, ORCL, QLGC, ELX, TQNT, RFMD, SWKS, VRSN, BEAS, MERQ, BRCD, BVSN, CHKP, CMGI, ICGE, CMRC, ARBA, ITWO, RBAK, NOK 5 ALA, STM, ASML, PHG, QQQ, ALTR, XLK, AMAT, AMCC, BRCM, 63 PMCS, JDSU, CSCO, SUNW, EMC, VTSS, TXCC, QLGC, ELX, KLAC, LRCX, NVLS, TER, MXIM, LLTC, XLNX, ATML, LSCC, TXN, CIEN, FLEX, INTC, MDY, SPY, DIA, XLF, BAC, C, JPM, FBF, PNC, XLI, MIM, MLF, XLV, NOK, NTAP, SEBL, VRTS, ORCL, TQNT, RFMD, SWKS, VRSN, BEAS, MERQ, BVSN, CHKP, CMGI, CMVT, GLW, ITWO, JNPR 4 ALTR, QQQ, AMAT, KLAC, LRCX, NVLS, TER, XLK, AMCC, BRCM, 57 PMCS, VTSS, ATML, XLNX, LLTC, MXIM, LSCC, TXN, CSCO, JDSU, SUNW, EMC, INTC, MDY, SPY, DIA, XLF, BAC, C, JPM, FBF, PNC, XLI, MIM, MLF, XLV, ORCL, QLGC, ELX, SEBL, STM, ASML, PHG, NOK, VRTS, BEAS, MERQ, CHKP, CIEN, CMGI, CMVT, ITWO, NTAP, TQNT, SWKS, TXCC, VRSN 3 ALTR, XLNX, LLTC, MXIM, QQQ, AMAT, KLAC, LRCX, NVLS, TER, 35 XLK, CSCO, EMC, SUNW, SPY, DIA, MDY, MIM, MLF, XLI, XLV, INTC, JDSU, SEBL, TXN, AMCC, PMCS, BRCM, CMGI, ITWO, NTAP, QLGC, VRSN, VRTS, VTSS, 2 ALTR, XLNX, QQQ, AMAT, KLAC, LRCX, NVLS, TER, AMCC, 24 PMCS, CSCO, SPY, DIA, MDY, MIM, MLF, SUNW, EMC, INTC, JDSU, MXIM, LLTC, SEBL, VRTS 1 ALTR, XLNX, QQQ, AMAT, KLAC, LRCX, NVLS, TER, CSCO, SPY, 15 DIA, MDY, MIM, SUNW, INTC 200 150 2o 100 50 OM A i 1 2 3 4 5 6 7 8 9 10 11 Time Period A 500 400 300 200 100 1 2 3 4 5 6 7 8 9 10 11 Time Period B 1,400 1,200 S1,000 800 600 400 200 1 2 3 4 5 6 7 8 9 10 11 Time Period C Figure 21: Largest group size by time period (A corresponds to the threshold value of 0.7, B corresponds to the threshold of 0.6, and finally, C pertains to the market graph with threshold 0.5) CHAPTER 3 EVOLUTION OF SOCIAL NETWORK Social Networks have attracted many scientists in the recent days especially because of their applications in social processes such as studying disease spreading [27 29], urbanization studies [3032], discover the network of Innovators [33] and in many other fields. Also social scientists are interested in studying evolving social networks as a dynamic process [34]. In this chapter, we present a model that would dynamically simulate a social network with some standard assumptions that are already in the literature and show that the model accurately mimics a real world network. Introduction Social Networks like web graph and finance graph evolve over time. In fact, the rate at which they change with time might be higher compared with other real world networks. They have network transitivity, where the nodes have tendency to connect if they share a mutual neighbor [4, 35]. They also have high degree correlation or assortative mixing [36], where nodes of high degree tend to connect with other nodes of high degree. Individuals of a social network could have identities [35, 37, 38], characteristics of a specific node, which helps in a hierarchical classification of nodes. For instance, in a student community there are more opportunities for two students to establish a relationship who are classified in the same group than for two students who are strangers to each other. These identities could be that the students may take the same classes, or play a common sport or have same music interests. Small world phenomenon is another feature of social network [46], nodes are separated by a distance of 6 or less on an average. These are also some of the parameters which would help in simulating our model. Most complex networks have preferential attachment and addition of new vertices [2]. An important thing to note is that unlike other complex real world networks like biological or technological networks all social networks will not be having preferential attachment and addition of new vertices to the networks with time. This feature especially not having preferential attachment makes these networks follow a single scale distribution as against power law distribution [3]. Of course, social networks like author collaboration and movie actors will possess these two characteristics. Social networks have a wide range of applications such as study of diseases in a network, identifying innovators in an author collaboration network, analyzing the pattern of migration among people and investigating criminal behavior using financial flows. Model We propose a model where we dynamically simulate a social network based on the strength of the relationship [39] and identities of the individuals [37, 38] involved in the relationship. Other factors influencing the growth of the network are mutual acquaintances shared by the individuals [40] and time of their last contact [34, 41]. The reasons for considering these parameters are quite intuitive. It takes more time for the relation to fade or disassociate, when the relation is stronger. Also relationships last longer with frequent contacts and fade away if individuals don't stay in touch. The relationship strengthens with more and more contacts. The mutual acquaintances shared by the individuals help in providing opportunities for the individuals to meet [39]. At each time interval we pick two random nodes and add an edge between them with a probability given by p*fx, where p =pe (3.1) f= 1 (lxa)b (3.2) The probability that an edge will be added between two nodes with m mutual acquaintances between them is p. This follows an exponential distribution, where the value of p is exponentially increases with number of mutual acquaintances [39]. The probability that an edge will be added between two nodes that have common identities fx. Each node is assigned a set of identities from global set based on uniform distribution and x is measure of the common identities between these two nodes with support as 0 to 1. This follows Kumaraswamy distribution [42], which is a double bounded distribution resembling beta distribution and good for simulation studies. The CDF fx monotonically increases with increase in value of x. Also if an edge is added we consider each of the nonmutual neighbors of one node and the other node as a potential relation and perform the edge additions with the probability p*fx. If the nodes considered already have an edge, we still calculate the probabilities and increase the weight of the relationship by one, strengthening the relationship. We again pick two random nodes and this time we consider them for deletion. The probability of deletion equal to tx*f x*x, where tx probability of deletion with x as a function of last time of contact Wx probability of deletion with x as a function of weight of the relationship fx probability of deletion with x as a function of mutual acquaintances Each of the above probabilities follows gamma distribution having an exponential x/O kl e decrease with increase in value of x, given by k F(k) The simulation ends when the average degree of the network reaches 5. The number of edges in the network will be O(n), a sparse network. A question that might occur is the consideration of nearest neighbors while adding edges but not while deleting edges. This addition could still be justified as a fair operation. The stopping condition has an average degree 5. So the additiontodeletion consideration is 1 to 5 or less at every time interval. Also, the nearest neighbor consideration for adding edges is a direct consequence of the highly clustered nature of social networks. Performance Analyses of the Model The key performance measures of the model used for simulation are * Clustering coefficient * Assortative mixing coefficient * Average length of the giant component (small world phenomenon) * Degree Distribution (Power law distribution) 3 Number of Triangles Clustering coefficient = (3.3) Number of Triples It is the tendency of two nodes that are connected by a path of length two to associate, which essentially forms the triangles in the network. As mentioned earlier, this is the motivation to consider nearest neighbors while adding edges. The assortative mixing coefficient [36] is given by M1 jlk, M 2 O+ k,) r 1 1 (3.4) 1 2 "M' (j2 + k 2 )h M(j+ ) This coefficient is the tendency of the nodes with high degree to connect with other nodes having high degree. The average shortest distance of the giant component should be less than 6 to have small world phenomenon in the social network. The degree distribution should follow power law distribution to ensure scale free characteristics. Results Simulations results are tabulated. The results are average of 10 test runs for each node value from 500 to 2000 (Table 31). Conclusions We saw the simulated model have the same features that are desired in the real world network with high correlation. The Rsquare value reported from regression analyses is as high as 95% indicating scalefree characteristics. This strengthens our convictions about the social network based on which the model was built. Also the model should help in studying social networks which change over time. The model could also be extended to social networks which has addition of newer vertices with time and shouldn't affect any of the characteristics already present in the model. Future works could also include more sophisticated ways of assigning identities to nodes. An implementation of one such assignment, where the nodes are distributed in a unit hypercube with each dimension representing an identity and the distance between them is inversely proportional to probability of adding an edge, is already in process. Table 31: Clustering coefficient, assortative mixing coefficient and Average length of the giant component Nodes CC AM Average Length Giant Component 1 500 0.49 0.83 3.41 245 2 800 0.42 0.79 3.78 434 3 1000 0.42 0.81 4.13 538 4 1200 0.37 0.77 4.39 695 5 1500 0.34 0.76 4.19 875 6 1700 0.32 0.75 4.32 1047 7 2000 0.34 0.75 4.56 1309 100 t *4S 4** Degree Figure 31: Degree distribution for n 1000 500 nodes (Rsquare 0.9056) * *** . 10 100 1000 Degree Figure 32: Degree distribution for n = 800 nodes (Rsquare = 0.9563) 1000 1000 z 1 1 10 100 1000 Degree Figure 33: Degree distribution for n = 1000 nodes (Rsquare = 0.9535) 1000 S100 E 10 10 1  ", 1 10 100 1000 Degree Figure 34: Degree distribution for n = 1200 nodes (Rsquare = 0.9511) 1000 E 100 E 10 ' S* 1 10 100 1000 Degree Figure 35: Degree distribution for n = 1500 nodes (Rsquare = 0.9549) 1000 ** 0*0 A4, <.,  ..  Degree Figure 36: Degree distribution for n = 1700 nodes (Rsquare = 0.9535) 1000 w** 4V &. ma 1000 Degree Figure 37: Degree distribution for n = 2000 nodes (Rsquare = 0.9541) CHAPTER 4 CONCLUSIONS In chapter 2, we discussed a quick efficient way to identify clusters of similar stocks in the market graph. From the results we concluded that our assumptions about the nodes that are connected must be highly correlated are fair. In chapter 3, we proposed a model that simulates the growth of social networks by considering weights for relationship strength and identities. Our simulation resulted in social models, having many interesting features that are desired in real world social networks, including high clustering and assortative mixing coefficient, small world phenomenon and scale free distribution. We then concluded, based on the results, by justifying the assumptions made to model the growth. Analyses of complex networks are the provenances of knowledge, which help in getting acquainted to the real world systems and make better decisions. We hope the thesis presented gave useful insights and motivated researchers to pursue research in the field of complex networks especially social networks. A special emphasis on social networks is laid to highlight its increasing role in various social activities over the last decade. LIST OF REFERENCES 1. Erd6s P, Renyi A. On the evolution of random graphs. Publ. Math. Inst. Hung. Acad. Sci. 1960;5:1761. 2. Barabasi AL, Albert R. Emergence of scaling in random networks. Science October 15, 1999;286:50912. 3. Amaral LAN, Scala A, Barthelemy M, Stanley HE. Classes of smallworld networks. Proc. Natl. Acad. Sci. 2000;97:1114952. 4. Watts D, Strogatz SH. Collective dynamics of 'smallworld' networks. Nature 1998; 393, 44042 5. Milgram S. The small world problem. Psychology Today 2 1967;p. 6067 6. Strogatz SH. Exploring complex networks. Nature 2001;410:26876 7. Boginski V, Butenko S, Pardalos PM. On structural properties of the market graph. In: Nagurney A, editor. Innovations in financial and economic networks. Northampton, MA: Edward Elgar Publishers, 2003. 8. Boginski V, Butenko S, Pardalos PM. Statistical analysis of financial networks. Computational Statistics and Data Analysis 2005,48(2):43143. 9. Boginski V, Butenko S, Pardalos PM. Mining market data: A network approach. Computers and Operations Research (in press). 10. Abello J, Pardalos PM, Resende MGC. On maximum clique problems in very large graphs. DIMACS Series, vol. 50. Providence, RI: American Mathematical Society; 1999. p. 11930. 11. Aiello W, Chung F, Lu L. A random graph model for powerlaw graphs. Experimental Mathematics 2001;10:5366. 12. Hayes B. Graph theory in practice. American Scientist 2000;88:913 (Part I), 104 9 (Part II). 13. Jeong H, Tomber B, Albert R, Oltvai ZN, Barabasi AL. The largescale organization of metabolic networks. Nature 2000;407:6514. 14. Watts D. Small worlds: the dynamics of networks between order and randomness. Princeton, NJ: Princeton University Press, 1999. 29 15. Watts D, Strogatz S. Collective dynamics of 'smallworld' networks. Nature 1998;393:4402. 16. Mantegna RN, Stanley HE. An introduction to econophysics: correlations and complexity in finance. Cambridge: Cambridge University Press, 2000. 17. Albert R, Barabasi AL. Statistical mechanics of complex networks. Reviews of Modem Physics 2002;74:4797. 18. Barabasi AL. Linked. Cambridge, MA: Perseus Publishing; 2002. 19. Boginski V, Butenko S, Pardalos PM. Modeling and optimization in massive graphs. In: Pardalos P.M, Wolkowicz H, editors. Novel approaches to hard discrete optimization. Providence, RI: American Mathematical Society; 2003. p. 1739. 20. Broder A, Kumar R, Maghoul F, Raghavan P, Rajagopalan S, Stata R, Tomkins A, Wiener J. Graph structure in the Web. Computer Networks 2000;33:30920. 21. Faloutsos M, Faloutsos P, Faloutsos C, On powerlaw relationships of the Internet topology. Cambridge, MA: ACM SICOMM, 1999. 22. Garey MR, Johnson DS. Computers and intractability: a guide to the theory of NP completeness. New York, NY: Freeman; 1979. 23. Arora S, Safra S. Approximating clique is NPcomplete. Proceedings of the 33rd IEEE Symposium on Foundations on Computer Science, Pittsburgh, 1992. p. 213. 24. Liao S, Devadas S, Keutzer K, Tjiang S, Wang A. Storage assignment to decrease code size. In Proceedings of the ACM SIGPLAN 1995 conference on Programming Language Design and Implementation, La Jolla (June 1995), ACM Press, pp.186 95. 25. Aho A, Hopcroft J, Ullman J. The design and analysis of computer algorithms. Reading, MA: Addison Wesley, 1974. 26. Bradley PS, Fayyad UM, Mangasarian OL. Mathematical programming for data mining: formulations and challenges. INFORMS Journal on Computing 1999;11(3):21738. 27. Rothenberg RB, Potterat JJ, Woodhouse DE, Muth SQ, Darrow WW, Klovdahl AS. Social network dynamics and HIV transmission. AIDS, August 20 1998;12(12):152936. 28. Aral SO. Sexual network patterns as determinants of STD rates. Sexually Transmitted Diseases, May 1999;26(5):26264. 29. Moore C, Newman MEJ. Epidemics and percolation in smallworld networks. Phys. Rev. E 61, 567882 (2000). 31 30. Andersson C, Hellervik A, Lindgren K, Hagson A, Tornberg J. Urban economy as a scalefree network. Phys. Rev. E 68, 036124 (2003) (6 pages). 31. Boyd M. Family and Personal Networks in International Migration: Recent Developments and New Agendas. International Migration Review, Vol. 23, No. 3, (Autumn, 1989) pp. 638670 32. Andersson C, Frenken K, Hellervik A. A complex network approach to urban growth. Papers in Evolutionary Economic Geography (PEEG) 0505, Utrecht University, Section of Economic Geography, revised Feb 2005. 33. Yeung YY, Liu TCY, Ng PH. A social network analysis of research collaboration in physics education. American Journal of Physics 2005;73:145. 34. Doreian P, Stokman FN. Evolution of social networks. New York, NY: Gordon and Breach, 1997. 35. Jin EM, Girvan M, and Newman MEJ. The structure of growing social networks. Phys. Rev. E 64, 046132 (2001). 36. Newman MEJ. Assortative mixing in networks. Phys. Rev. Lett. 89, 208701 (2002). 37. White HC. Identity and Control. Princeton. NJ: Princeton University Press, Princeton, 1992. 38. Watts D, Dodds PS, Newman MEJ. Identity and search in social networks. Science 2002;296:130205. 39. Ravasz E, Barabasi AL. Hierarchical organization in complex networks. Phys. Rev. E 67, 026112 (2003) (7 pages). 40. Kossinets G, Watts D. Empirical analysis of an evolving social network science. Vol. 311, No. 5757. (6 January 2006), pp. 8890. 41. Newman MEJ. Clustering and preferential attachment in growing networks. Phys. Rev. E 64, 025102 (2001) (4 pages). 42. Kumaraswamy P. A generalized probability density function for doublebounded random processes. Journal of Hydrology 1980;46:7988. BIOGRAPHICAL SKETCH The author of this thesis, Mr. Ashwin Arulselvan, is a master's student at the University of Florida majoring in industrial and systems engineering with operations research as focus. 