Citation
Network Analysis of Page Likes from Facebook User Profiles

Material Information

Title:
Network Analysis of Page Likes from Facebook User Profiles
Series Title:
Journal of Undergraduate Research
Creator:
Brauner, Kyle
Kocheturov, Anton
Pardalos, Panos M.
Place of Publication:
Gainesville, Fla.
Publisher:
University of Florida
Publication Date:
Language:
English

Subjects

Genre:
serial ( sobekcm )

Notes

Abstract:
Online social networks such as Facebook and Twitter display similar topological properties to those of small-world networks. Such networks are characterized by a high clustering coefficient, a low average path length, and a degree distribution that follows a power law (i.e. they are representative of a scale-free network). This research aims to show that when the pages that Facebook users prefer, or “like,” are represented as a network of nodes connected by the people who like them. The calculated properties of this network of Facebook pages, including the average path length, diameter, and clustering coefficient, returned results that are consistent with those of small-world networks. Perhaps more importantly, the degree distribution of the resulting graph follows a power law, another major property of many small-world networks. These properties of the network topology may be useful in further experiments, such as attempting to improve the prediction accuracy of age and gender of Facebook users.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright by Creator or Publisher. Permission granted to University of Florida to digitize and display this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.

Downloads

This item is only available as the following downloads:


Full Text
xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID EJUGP65X3_DVADOM INGEST_TIME 2018-05-18T18:55:44Z PACKAGE UF00091523_00851
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES



PAGE 1

University of Florida | Journal of Undergraduate Research | Volume 1 9 Issue 2 | Spring 201 8 1 Network Analysis of Page Likes from Facebook User Profiles Kyle Brauner Anton Kocheturov Panos M. Pardalos Center for Applied Optimization, University of Florida O nline social networks such as Facebook and Twitter display similar topological properties to those of small world networks. Such networks are characterized by a high clustering coefficient, a low average path length, and a degree distribution that follows a power law (i.e. they are representative of a scale free network). This research aims to show that when the pages that Facebook users prefer, or The calculated properties of this network of Facebook pages, including the average path length, diameter, and clustering coefficient, returned results that are consistent with those of small world networks. Perhaps more importantly, the degree distribution of the resulting graph f ollows a power law, another major property of many small world networks. T hese properties of the network topology may be useful in further experiments, such as attempting to improve the prediction accuracy of age and gender of Facebook users. INTRODUCTION etworks are all around us : biological networks such as metabolic networks, and networks that have been developed at the hands of humans, such as the World Wide Web. Many of these networks can be classified as small world networks. Examples of small world networks include electric power grids, networks of connected proteins, and neuron networks in the brain. Small world topology can be characterized by dense local clustering and a short average path length betwe en pairs of nodes which is an attractive model for the organization of functional networks because it supports both specialized and distributed information processing (Bassett & Bullmore, 2016). Systems with small network properties often display higher s ignal propagation and computational power (Kosinski et al., 2013). They are also economical in the sense that they tend to minimize wiring costs while bearing high dynamical complexity (Bassett & Bullmore, 2016). Throughout the last two decades or so, an increasingly popular network type has become the online social network. Online social networks refer to people and organizations connecting to one another via some sort of online framework or application. The most obvious examples are social networking sit es such as Facebook and Twitter. In this project, a rather unique approach to analyzing the pages that Facebook users prefer is taken In most literature, the networks that are analyzed based on Facebook data consider the people as nodes of the graph whic h are connected by links based on friendship. An alternative approach has been to represent the people as nodes which are connected by the things they like on Facebook (i.e. pages, status updates, comments, etc.). This approach, however, looks to analyze a network in which Facebook pages (e.g. BMW, art, CNN.com) are represented as nodes which are connected by the people who like them. This paper provides evidence to support the notion that the constructed network of page likes displays similar properties to those of many small world networks. DATASET EXPLANATION A dataset, consisting of roughly 46.5 million unique user and Facebook page pairings was obtained from results of the myPersonality project. Launched in 2007 by David Stillwell and Michal Kosinski of the University of Cambridge, myPersonality was a popular Facebook application that allowed users to take a psychometric exam and, with the profile. Various attributes of each user, including age, g ender, relationship status, and page likes, were documented. Kosinski et al. (2013) then used the page likes information to attempt to predict private traits and attributes of Facebook users. The data set in clud es 221,830 distinct users along with 5,556,50 2 different pages that were liked among these users. The graph that w as constructed for this analysis, however, consisted of only the Facebook pages that were liked by at least 100 people. The result was a graph consisting of 46,137 nodes representing the pages. Then, for each possible pair of Facebook pages i and j, 2 separate values we re determined : n i number of users who liked page i n i,j number of users who like both pages i and j A directed arc in the grap h was then dr awn from node i to node j if n i ,j /n i R esults were obtained at incremental threshold levels between 0.2 and 0.5. The calculations of a few important network properties at various threshold levels are shown in Table 1 below. N

PAGE 2

K YLE B RAUNER A NTON K OCHETUROV P ANOS M. P ARDALOS U niversity of Florida | Journal of Undergraduate Research | Volume 19 Issue 2 | Spring 2018 2 Table 1. Properties of the largest connected component at various thresholds Threshold, r Size (number of nodes) Clustering coefficient Diame ter Average path length 0.2 46111 0.77572 5 1.989186 0.25 45948 0.754604 6 2.028626 0.3 45092 0.706791 9 2.126386 0.35 42979 0.628105 12 2.317733 0.4 39209 0.528405 14 2.613994 0.45 33394 0.426104 16 3.00793 0.5 25271 0.31184 19 3.480005 DATA CHARACTERISTICS For each threshold level, three important values that characterize the topology of the largest connected component in the network we re calculated : clustering coefficient, diameter, and average path length. The clustering coefficient provides an overall indication of clustering in the network. C onsider network of friends, where the people are represented as no des connected via friendship, a high clustering coefficient implies that the friends of any given user are likely to be friends with one another as well. The average path length is defined as the average number of steps it takes to get from one node of the graph to any other node. It is calculated by finding the mean of the shortest path between all pairs of nodes. The diameter, on the other hand, is the longest of all the shortest paths in the network. In other words, it is the shortest distance between th e two most distant nodes. Perhaps the most interesting results were obtained at threshold level r = 0.2. By analyzing the largest connected component in the graph, represented by a network consisting of 46,111 nodes, we calculated a clustering coefficient of 0.7757, an average path length of 1.9892, and a diameter of 5. Additionally, the in degree distribution of the directed graph follows a power law for all threshold values analyzed, indicating that this network is a scale free network, a characteristic s hared by many real networks. A chart of this degree distribution at threshold 0.2 is represented by the log log plot in Figure 1 below. Figure 1. In degree distribution for graph at threshold r = 0.2 The fact that the degree distribution fol lows a power law is perhaps the most important finding. However, it is interesting that the out degree distribution at a threshold of 0.2, represented by the log log plot in Figure 2, does not follow a power law. It is unusual because this is the only prop erty of the network that differs among other small world scale free networks. S till, this atypical out degree distribution does not seem to affect other properties of the small world topology. Figure 2. Out degree distribution for graph at threshold r = 0.2 CONCLUDING REMARKS These results are significant because many of the calculations that we re made for threshold r = 0.2, in comparison to those of thresholds, closely follow the properties of small wor ld networks. Small world networks are described as networks that are highly clustered, like regular lattices, have small average path lengths, and are characteristic of a scale free network (Wattz & Strogatz, 1998). Small world topology has been seen in some neuronal networks of the brain, which has enabled further research in the field of seizure generation and short term memory loss (Netoff et al., 2004). As shown in Table 1, the highest clustering coefficient and lowest avera ge path length among all the observed thresholds both occur at r = 0.2. Accordingly, any two pairs of Facebook pages i and j require a smaller ratio of the number of users who liked both pages i and j to the number of users who liked page i for an edge to be drawn from i to j in comparison to the other threshold values. Thus, as the threshold is lower ed more pages are represented in the network, and more connections are made between the pages, but these connections are weaker. As the threshold is increase d however, a greater probability that the same people are liking two connected nodes exists and so the connections become stronger and more valuable. Furthermore, the threshold is increase d both the out degree and in degree distributions begin displaying a power law structure at the cost of less pages being represented in the network. Hence, a tradeoff between pages being represented in the network and the value of the connections among the pages exists The log log plots of the in and out degree distrib utions at threshold 0.5 are shown in Figures 3 and 4, respectively. The out degree distribution 1 10 100 1000 10000 1 100 10000 Number of nodes with in degree k Number of incoming links, k Series1 1 10 100 1000 Number of nodes with out degree k Number of outgoing links, k Series1

PAGE 3

N ETWORK A NALYSIS OF P AGE L IKES FROM F ACEBOOK U SER P ROFILES University of Florida | Journal of Undergradua te Research | Volume 1 9 Issue 2 | S pring 2018 3 in particular seems to follow more of a power law structure when compared to that of threshold 0.2 Figure 3. In degree distribution for graph at threshold r = 0.5 Figure 4. Out degree distribution for graph at threshold r = 0.5 FUTURE DIRECTIONS One major goal of this project is to use the structural properties of this network to possibly improve upon e, gender, and number of Facebook friends (Kosinski et al., 2013). One approach could be to implement clustering analysis to separate the network into groups, or cliques. Then, for each clique, the most common traits among the users who liked the pages in that clique could be determine d Then, after assigning each user to a clique based on the pages they liked on Facebook, the most common traits among the clique for prediction could be use d To do this, the optimal threshold value to use would likely have t o be f ou nd Ideally, as many pages as possible would be incorporate d into the graph without sacrificing the strength of the connections between them. Another possible research direction is to understand why the out degree distribution differs from the in degree distribution at lower thresholds but does not seem to affect other properties of the network topology which could be due to the manner in which the network w as constructed and so alternative methods could be pursued. REFERENCES Bassett, D.S., Bullmore, E. (2016). Small World Brain Networks. The Neuroscientist, 12 (6), 512 523. Kosinski M, Stillwell D, Graepel Y. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sc iences (PNAS). Netoff, T.I., Clewley, R., Arno, S., Keck, T., White, J.A. (2004). Epilepsy in small world networks. J Neurosci 24 8075 8083. Watts, D.J., Strogatz, S.H. (1998) Collective dynamics of 'small world' networks. Natur e, 393 440 442. 1 10 100 1000 10000 1 100 10000 Number of nodes with in degree k Number of incoming links, k Series1 1 10 100 1000 10000 1 10 100 1000 Number of nodes with out degree k Number of outgoing links, k Series1