<%BANNER%>

A Data-Driven Framework for Multi-Dimensional Prediction Processes for Wlan Mobile Users

MISSING IMAGE

Material Information

Title:
A Data-Driven Framework for Multi-Dimensional Prediction Processes for Wlan Mobile Users
Physical Description:
1 online resource (83 p.)
Language:
english
Creator:
Kim, Jeeyoung
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Computer Engineering, Computer and Information Science and Engineering
Committee Chair:
Helmy, Ahmed Abdelghaffar
Committee Members:
Chen, Shigang
Sahni, Sartaj Kumar
Chow, Yuan R
Fang, Yuguang

Subjects

Subjects / Keywords:
wi-fi
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre:
Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
With the proliferation of numerous light weight devices along with the widespread use of wireless local area networks (WLANs) in many public places we areconnected-on-the-go nowadays more than ever. Such change, in device technologyand coverage ubiquity, results in unexplored dynamics and raises severalchallenging questions. How are these changes affecting the behavior of mobileusers? And how do these changes affect mobile user predictability and thenetworking protocols that utilize it? To shed light on the changes and how protocols involving the mobility ofusers can change, we follow a systematic analysis methodology. First, using athree year long network trace collected from Dartmouth College, we study theuser mobility and its effects on predictability of regular and ultra-mobileusers, by analyzing the contrast between the mobility of the WLAN users, andfour carefully selected sets of ultra-mobile users across various mobilitymetrics. We also investigate how these differences in mobility affect thepredictability of such user’s next locations. Then, we study the evolution ofuser mobility using extensive network traces over a period of ten years,collected from two major Universities (University of Florida and DartmouthCollege) and also investigate a series of prediction methods in order to analyzethe evolution of prediction accuracy of these WLAN users. Based on the insights gained from these extensive analyses, we design anovel framework of a multi-dimensional prediction process that aims to improveprediction of mobile WLAN users. We also include a study on two subsets, namelya subset of the smart-phone devices and laptop devices from the UF Fall 2011trace. This study of user mobility and predictability, followed by our multi-dimensional prediction processframework paves the way for better understanding of present day mobile usersand aids in better prediction of future WLAN mobile users.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Jeeyoung Kim.
Thesis:
Thesis (Ph.D.)--University of Florida, 2013.
Local:
Adviser: Helmy, Ahmed Abdelghaffar.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2013
System ID:
UFE0045020:00001

MISSING IMAGE

Material Information

Title:
A Data-Driven Framework for Multi-Dimensional Prediction Processes for Wlan Mobile Users
Physical Description:
1 online resource (83 p.)
Language:
english
Creator:
Kim, Jeeyoung
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Computer Engineering, Computer and Information Science and Engineering
Committee Chair:
Helmy, Ahmed Abdelghaffar
Committee Members:
Chen, Shigang
Sahni, Sartaj Kumar
Chow, Yuan R
Fang, Yuguang

Subjects

Subjects / Keywords:
wi-fi
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre:
Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
With the proliferation of numerous light weight devices along with the widespread use of wireless local area networks (WLANs) in many public places we areconnected-on-the-go nowadays more than ever. Such change, in device technologyand coverage ubiquity, results in unexplored dynamics and raises severalchallenging questions. How are these changes affecting the behavior of mobileusers? And how do these changes affect mobile user predictability and thenetworking protocols that utilize it? To shed light on the changes and how protocols involving the mobility ofusers can change, we follow a systematic analysis methodology. First, using athree year long network trace collected from Dartmouth College, we study theuser mobility and its effects on predictability of regular and ultra-mobileusers, by analyzing the contrast between the mobility of the WLAN users, andfour carefully selected sets of ultra-mobile users across various mobilitymetrics. We also investigate how these differences in mobility affect thepredictability of such user’s next locations. Then, we study the evolution ofuser mobility using extensive network traces over a period of ten years,collected from two major Universities (University of Florida and DartmouthCollege) and also investigate a series of prediction methods in order to analyzethe evolution of prediction accuracy of these WLAN users. Based on the insights gained from these extensive analyses, we design anovel framework of a multi-dimensional prediction process that aims to improveprediction of mobile WLAN users. We also include a study on two subsets, namelya subset of the smart-phone devices and laptop devices from the UF Fall 2011trace. This study of user mobility and predictability, followed by our multi-dimensional prediction processframework paves the way for better understanding of present day mobile usersand aids in better prediction of future WLAN mobile users.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Jeeyoung Kim.
Thesis:
Thesis (Ph.D.)--University of Florida, 2013.
Local:
Adviser: Helmy, Ahmed Abdelghaffar.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2013
System ID:
UFE0045020:00001


This item has the following downloads:


Full Text

PAGE 1

A DATA-DRIVEN FRAMEWORK FOR MU LTI-DIMENSIONAL PREDICTION PROCESSES FOR WLAN MOBILE USERS By JEEYOUNG KIM A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORID A IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2013 1

PAGE 2

2013 Jeeyoung Kim 2

PAGE 3

To my husband Yong Nam for his love and support, and to my mom and dad for always believing in me. I love you. 3

PAGE 4

ACKNOWLEDGMENTS First and foremost, I would like to expr ess my deepest gratitude to my advisor Prof. Ahmed Helmy for the continuous support and guidance throughout my Ph.D. degree. With his patience, motivation, enthusiasm, encouragement and immense knowledge he inspired me to become a better researcher. I could not have asked for a better advisor to work with and I am truly bl essed to have had him as my Ph.D. advisor. I would also like to thank my supervisory committee members Prof. Randy Chow, Prof. Shigang Chen, Prof. Sart aj Sahni and Prof. Michael Fang for their insights in asking questions that allow ed me to think in different perspectives and new ideas, I thank you. For the many colleagues I had privilege to work with; For Dr. Wei-jen Hsu, Dr. Shao-cheng Wang and Dr. Sapon Tenachaiwiwat for their advice and encouragement as seniors in the lab when I had just joined the NOMADS group. For Dr. Udayan Kumar and Dr. Gautam S. Thakur for the stimul ating discussions, numerous feedbacks on my research and friendship and my other colleagues Dr. Sungwook Moon, Dr. Saeed Moghaddam, Yibin Wang and Guliz Seray Tuncay for many helpful discussions and suggestions for my research. I w ould also like to express my gratitude for the support of the CISE graduate advisors Mr. John Bowers, Ms. Joan Crisman and Ms. Kristina Sapp. I do not have the words to express my love and gratitude towards my family. To my husband who loved me through thick and thin, I would have been lost without his patience and encouragement. To my dad, who has always been and will always be my role model. To my mom for being there w henever I needed her. For both of my parents unconditional love, support, guidance in life and for always believing in me no matter 4

PAGE 5

what. To my little brother who never fails to make me smile. To my aunts and uncle for all their love. I would not be where I am today if it were not for my family. I thank all my friends who have been ther e for me. There are too many to count but their friendship has always given me the boost that I needed whether they were near or far. A special thanks goes out to all my fri ends and the priests at the Gainesville Korean Catholic Community, for their prayers, spiritual support and for warm words that always touched my heart. Thanks for being there when I needed a listening ear and for sharing my faith. Last but not least I thank God for granti ng me this life, for all the joy and the sorrows and for guiding me through this j ourney. I have been blessed to have had the opportunity to pursue this Ph. D. degree. 5

PAGE 6

TABLE OF CONTENTS page ACKNOWLEDG MENTS ................................................................................................. 4LIST OF TABLES ............................................................................................................ 8LIST OF FI GURES .......................................................................................................... 9ABSTRACT ................................................................................................................... 13CHAPTER 1 INTRODUC TION .................................................................................................... 152 RELATED WORK ................................................................................................... 193 DATA SETS A ND METRIC S .................................................................................. 223.1. Data Sets ......................................................................................................... 223.1.1. Dartmouth College Tr ace ....................................................................... 223.1.2. University of Florida Trace ...................................................................... 243.2. Metr ics ............................................................................................................. 263.2.1. Mobilit y Metrics ....................................................................................... 263.2.1.1. Distinct number of APs .................................................................. 263.2.1.2. Pr evalence .................................................................................... 263.2.1.3. Acti vity r ange ................................................................................ 273.2.1.4. AP encount er rati o ........................................................................ 273.2.2. Predictability Metrics ............................................................................... 273.2.2.1. Markov fam ily of predi ctors ........................................................... 283.2.2.2. Lempel-Ziv algorithm ..................................................................... 284 WLAN USER MOBILITY AND PREDICTA BILITY .................................................. 334.1. Mobility Comparis on ........................................................................................ 334.2. Predictability Comparison ................................................................................ 345 EVOLUTION OF WLAN USER MOBI LITY AND PREDIC TABILITY ....................... 435.1. Evolution of User Mob ility ................................................................................ 435.2. Evolution of Us er Predictabi lity ........................................................................ 446 FRAMEWORK DESIGN ON MULT I-DIMENSIONAL PREDICTION PROCESSE S ......................................................................................................... 526.1. Insights and Improvem ents .............................................................................. 526.2. Multi-Dimensional Predi ction Process Framewor k ........................................... 53 6

PAGE 7

6.2.1. Temporal Expansions ............................................................................. 556.2.2. Spatial Expansions ................................................................................. 556.2.2.1. Different granular ities of prediction ............................................... 566.2.2.2. Distance error rate ........................................................................ 576.2.3. Gui delines .............................................................................................. 597 CONCLUSION AND FU TURE WO RK .................................................................... 78LIST OF RE FERENCES ............................................................................................... 80 7

PAGE 8

LIST OF TABLES Table page 3-1 Data Sets Extracted from Dartmout h Trac e ........................................................ 293-2 Number of APs and Users in each Yearl ong Trac e ............................................ 293-3 Data Characteristics of Universi ty of Florida Year Long Data Sets ..................... 293-4 Data Characteristics of University of Florida Semester Long Data Sets ............. 303-5 Data Characteristics of UF 2011 Fall Identified Smart-phones and Laptops ...... 305-1 Correlation coefficients for differen t metrics with Markov O(2) Trace ................. 495-2 Time evolution on different characterist ics from yearly traces at a glance .......... 495-3 Evolution and correlation between pr ediction accuracy of Markov O(2) ............. 496-1 First order statistics on the distance error rate. Including when distance is 0 (which is a hit, Distance >= 0) ............................................................................. 756-2 First order statistics on the distance error rate. Not including when distance is 0 (Distance > 0) .................................................................................................. 756-3 First order statistics on t he distance error rate. Not including when it is a miss within the same buildin g (Distance > 10) ............................................................ 75 8

PAGE 9

LIST OF FIGURES Figure page 3-1 Illustration of the definition of acti vity range, shown as the square simulation area representing the geographic extent of the wireless mobile network. .......... 313-2 Illustration of a user loca tion history of ABCBCADBCA ...................................... 313-3 Illustration of how the Markov O(2) will work given the user location history string of ABCB CADBCA ..................................................................................... 323-4 Illustration of how the LZ predictor will work given the user location history of string ABCBC ADBCA ......................................................................................... 324-1 WLAN user prevalence fo r Dartmouth College Trace ......................................... 374-2 VoIP device user prevalence for Dartmouth College Trac e ................................ 374-3 Cumulative Probability of Unique number of APs visited in Dartmouth College WLAN Trace ....................................................................................................... 384-4 Cumulative Probability of Unique number of APs visited in VoIP subset for Dartmouth Coll ege Trac e ................................................................................... 384-5 WLAN Range Distribution for Dartmouth College Trac e ..................................... 394-6 VoIP Range Distribution for Dartmouth College Trac e ....................................... 394-7 Cumulative Probability of Users Pr ediction Accuracy of Markov O(1) for Dartmouth Coll ege Trac e ................................................................................... 404-8 Cumulative Probability of Users Pr ediction Accuracy of Markov O(2) for Dartmouth Coll ege Trac e ................................................................................... 404-9 Cumulative Probability of Users Pr ediction Accuracy of Markov O(3) for Dartmouth Coll ege Trac e ................................................................................... 414-10 Cumulative Probability of Users Pr ediction Accuracy of LZ for Dartmouth College Tr ace ..................................................................................................... 414-11 Comparison of Cumulative Probability of Users Prediction Accuracy of all Predictors for Dartmouth College WLAN Trac e .................................................. 424-12 Comparison of Cumulative Probability of Users Prediction Accuracy of all Predictors for Dartmouth College VoIP Trace ..................................................... 425-1 Correlation between the AP encounter ra tio and prediction using Markov O(2) on 1000 randomly sampl ed 02-03 us ers ............................................................ 48 9

PAGE 10

5-2 Correlation between the AP encounter ra tio and prediction using Markov O(2) on 1000 randomly sampl ed 03-04 us ers ............................................................ 485-3 Cumulative Probability of Markov O(2) prediction accuracy for Dartmouth Quarter data sets for data collected betw een 2002-2006 ................................... 505-4 Cumulative Probability of Markov O( 2) prediction accuracy for UF semester data sets for data colle cted between 20072011 ................................................ 505-5 Comparison of Cumulative Probability of Markov O(2) prediction accuracy between semester long and year long tr aces at UF for data collected between 20072009 ............................................................................................ 515-6 Comparison of Cumulative Probabi lity of VoIP vs WLAN Users from Dartmouth trace 2001-2004 and also UF Fall 2011 smart-phones vs. laptops this shows that the gap between the tw o (ultra mobile and less mobile) have gone down from 33% differenc e to 7% di fference .............................................. 516-1 Framework of multi-di mensional prediction proc esses in a nutshell ................... 616-2 Comparison of the different prediction processes by controlling the input ordered by ascending order of prediction using the Oracle for UF Fall 2007 ...... 626-3 Improvement of Oracle over UF Fall 2007 sorted by the improvement of sectional in a scending or der ............................................................................... 626-4 Comparison of the different prediction processes by controlling the input ordered by ascending order of predicti on using the Oracle for UF Spring 2008 ................................................................................................................... 636-5 Improvement of Oracle over UF Sp ring 2008 sorted by the improvement of sectional in a scending or der ............................................................................... 636-6 Comparison of the different prediction processes by controlling the input ordered by ascending order of predicti on using the Oracle for UF Fall 2011 laptops ................................................................................................................ 646-7 Improvement of Oracle over UF Fall 2011 laptop devices sorted by the improvement of sectional in ascendi ng order ..................................................... 646-8 Comparison of the different prediction processes by controlling the input ordered by ascending order of predicti on using the Oracle for UF Fall 2011 smart-phones ...................................................................................................... 656-9 Improvement of Oracle over UF Fall 2011 smart-phone devices sorted by the improvement of sectional in ascendi ng order ..................................................... 65 10

PAGE 11

6-10 Comparison of Cumulative Probabili ty of AP and building level Markov O(2) prediction for 01-02 user s ................................................................................... 666-11 Comparison of Cumulative Probabili ty of AP and building level Markov O(2) prediction for 02-03 user s ................................................................................... 666-12 Comparison of Cumulative Probabili ty of AP and building level Markov O(2) prediction for 03-04 user s ................................................................................... 676-13 Comparison of Cumulative Probabili ty of AP and building level Markov O(2) prediction for 05-06 user s ................................................................................... 676-14 Cumulative Probability of Predi ction Accuracy for 01 02 User Trace .................. 686-15 Cumulative Probability of Prediction Accuracy for 0203 User Trace ................. 686-16 Cumulative Probability of Prediction Accuracy for 0304 User Trace ................. 696-17 Cumulative Probability of Prediction Accuracy for 0506 User Trace ................. 696-18 Cumulative Probability of Markov O(2) AP Level Prediction for each Yearly Trace at Dartm outh Colle ge ............................................................................... 706-19 Cumulative Probability of Markov O(2) Building Level Prediction for each Yearly Trace ....................................................................................................... 706-20 Empirical CDF showing the distance erro r rate in meters for 50 random users from UF Fall 2007 trace (distance error rate of 0 500 me ters shown) ............. 716-21 Empirical CDF showing the distance erro r rate in meters for 50 random users from UF Spring 2008 trace (distance error rate of 0 500 meters shown) ......... 716-22 Empirical CDF showing the distance e rror rate in meters (from 0-500) for 50 random users from UF Fall 2011 smart-phone traces ........................................ 726-23 Empirical CDF showing the distance e rror rate in meters (from 0-500) for 50 random users from UF Fall 2011 laptop traces ................................................... 726-24 Empirical CDF showing the distance erro r rate in meters for 50 random users from UF Fall 2007 trace (zoom in and show only distance error rate of 15 500 meters ) ........................................................................................................ 736-25 Empirical CDF showing the distance erro r rate in meters for 50 random users from UF Spring 2008 trace (zoom in and show only distance error rate of 15 500 mete rs) ..................................................................................................... 736-26 Empirical CDF showing the distance erro r rate in meters for 50 random users from UF Fall 2011 smart-phone trace (zoom in and show only distance error rate of 15 500 meters ) ..................................................................................... 74 11

PAGE 12

6-27 Empirical CDF showing the distance erro r rate in meters for 50 random users from UF Fall 2011 laptop trace (zoom in and show only distance error rate of 15 500 mete rs) ................................................................................................ 746-28 Correlation between the accumulat ed history and the AP encounter ratio of 100 random users from UF Fall 2007 ................................................................. 766.29 Correlation between the accumulat ed history and the AP encounter ratio of 100 random users from UF Spring 2008 ............................................................ 766-30 Pseudo code for prediction guidelines derived from the results above ............... 77 12

PAGE 13

Abstract of Dissertation Pr esented to the Graduate School of the University of Florida in Partial Fulf illment of the Requirements for t he Degree of Doctor of Philosophy A DATA-DRIVEN FRAMEWORK FOR MU LTI-DIMENSIONAL PREDICTION PROCESSES FOR WLAN MOBILE USERS By Jeeyoung Kim August 2013 Chair: Ahmed Helmy Major: Computer Engineering With the proliferation of numerous light weight devices along with the wide spread use of wireless local area networks (WLA Ns) in many public places we are connected-on-the-go nowadays more than ever. Such change, in device technology and coverage ubiquity, results in unexplored dynamics and raises several challenging questions. How are these changes affecting the behavior of mobile users? And how do these changes affect mobile user predictab ility and the networking pr otocols that utilize it? To shed light on the changes and how protocol s involving the mobility of users can change, we follow a systematic analysis methodol ogy. First, using a three year long network trace collected from Dartmouth College, we study the user mobility and its effects on predictability of r egular and ultra-mobile users, by analyzing the contrast between the mobility of t he WLAN users, and four carefully selected sets of ultra-mobile users across various mobility metrics. We al so investigate how these differences in mobility affect the predictabi lity of such users next locations. Then, we study the evolution of user mobility using extensive network traces over a period of ten years, collected from two major Universities (Unive rsity of Florida and Dartmouth College) and 13

PAGE 14

14 also investigate a series of prediction me thods in order to analyze the evolution of prediction accuracy of these WLAN users. Based on the insights gained from these extensive analyses, we design a novel framework of a multi-di mensional prediction process that aims to improve prediction of mobile WLAN users. We also include a study on two subsets, namely a subset of the smart-phone devices and laptop devices from the UF Fall 2011 trace. This study of user mobility and predictability, followed by our multi-dimensional prediction process framework paves the way for better understanding of present day mobile users and aids in better prediction of future WLAN mobile users.

PAGE 15

CHAPTER 1 INTRODUCTION Wireless LAN (WLAN) traces are an important source of information that allows researchers to have a glimpse into real life human behavior. A users WLAN usage pattern may closely relate to real-life human behavior, and constitutes a critical research area in wireless networks. As devices becom e more portable (such as netbooks, smartphones, tablets, etc.) mobile users behavior and usage pattern is likely to change. These changes in the nature of the devices; they are po rtable and easier to carry around, and mobile for use on the go, subsequent ly allow the WLAN traces to capture more user mobility than ever bef ore. Devices portability allows users to be more mobile while using the network than before. Also, ac cess ubiquity allows users to go about their regular routines even while using their mob ile devices. Previously, due to device and coverage restrictions users were forced in many cases to change their behavior and limit their mobility during network usage. Ho w much of these changes will appear in the traces and reflect the users increased mobility? We investigate a five year long WLAN trace collected over a six year period (20012006) at the Dartmouth College [14] and a three year long trace collected over a five year period (2007-2012) at the Un iversity of Florida (UF) [13] For the first part of our study, we focus on a few subsets of wireless users from Dartmouth, which have been systematically selected to be more mobile than the rest of the WLAN users according to various mobility metrics (e.g., access points (APs) visited), from a three year long trace (2001-2004). We call these subs ets the ultra-mobile data set and call each user in the subsets, ultra-mobile users. One of the subsets consists of VoIP device users. These users leave their devices on most of the ti me and the devices are light enough to walk 15

PAGE 16

and talk. Hence, these users show a more m obile characteristic than laptop or other heavy device users while connected to the net work. We aim to com pare the behavior of highly mobile users to the general WLAN users by analyzing these traces. This sheds light on the realism of WLAN trace-based models. We also aim to examine the effect of any differences on protocol performanc e, e.g., predicti on protocols. When the mobility of ultra-mobile user subsets are compared to our entire WLAN trace, our results clearly indi cates that there is a signific ant difference of APs visited and the coverage area between ultra-mobile user s and general mobile users. But does such dramatic contrast in mobilit y affect mobile networking pr otocols? In order to quantify such effect we examine the accuracy of seve ral classes of mobility prediction protocols under various conditions of realistic mobility. We compare these different sets of tr aces using several different predictors including the Markov O(1), O(2), O(3) and also the LZ predictor. Our experiments indicate that the Markov O(2) is the predi ctor with the highest accuracy among the four predictors and the LZ has the lowest. Surpri singly, all predictors perform quite poorly with ultra-mobile users, with VoIP users havi ng the lowest prediction rate of an average of approximately 25% correct prediction rate, compared to 60% for the general WLAN users, while the other ultra-mobile user subsets fall in between these prediction rates. These results prompt re-vis iting of such algorithms fo r ultra-mobile users. To further investigate the mobility and predi ctability trends over the more recent years, we conduct an evolution analysis for the second part of our study. In Chapter 5 of our study, we increase the study period span to six years, add another set of data set collected from UF spanning four years and adopt a few more 16

PAGE 17

mobility metrics. Mobility is now also measured by the num ber of distinct and total APs a user has visited and also the users encou nter ratio of such APs. We focus on the Markov chain family of predictors ranging from order 1 through 3 in this study, particularly focusing on Markov O(2); the best performer in our overall study. We find that the number of distin ct APs an average user visits each year increases by approximately 3 APs every year for the Da rtmouth trace. We also find that the predictability decreases each year resulting in a 20% drop of prediction accuracy from the 01-02 trace to the 05-06 trac e for Dartmouth and a 30% drop from fall of 2007 to fall 0f 2011 at UF. This study shows that the user mobility is indeed changing over time and with the changes in the user mobility, there is also a shift in the user location predictability over time. Such trends are expected to further manifest themselv es in the future with more portable devices. We have not only done the time evolution analysis over several years of WLAN traces, but we have also explored several different definitions of success for the predictors in order to better u nderstand the results of our analysis. Stemmed from the findings and insights gained from the extensive analysis we have designed a framework that expands the spat iotemporal space for multi-dimensional prediction pr ocesses. This is a novel fr amework which given multiple dimensions of prediction processes by contro lling the input or output (or both), we will be able to better predict future WLAN mobile users by choos ing from a set of different prediction processes rather than using a single predicti on process. We expand the temporal state space by looking at different blocks of time in a day, and the spatial aspect by introducing the distance error rate as well as different granularities of success. 17

PAGE 18

18 The rest of the document is outlined as follows. Chapter 2 discusses related work and present literature, Chapter 3 explains the data sets and metrics used throughout this study. In Chapter 4 we investigate WLAN mobile users mobility and predictability, and how they affect each other, in Chapter 5 we discuss our findings on the evolution study of a mobile users m obility and predictability over t he years, in Chapter 6 we discuss the insights and ways to improve th e prediction process and introduce the multidimensional prediction pr ocess framework and the Oracle. We explore the spatiotemporal expansions using metrics such as dist ance error rate, different granularities of success and the time of day users are predicted. We also show a case study of smart phones vs. laptops and we conclude and discuss future work in Chapter 7.

PAGE 19

CHAPTER 2 RELATED WORK Related work can be found in areas of mob ility modeling, location prediction, trace analysis and behavioral mining for WLAN us ers. Among various mobility modeling techniques, real world trace based modeling is the most realistic and is what can be called the closest to the ground truth. [2] shows a mobility model that is based on real WLAN traces, [4] also extracts a mobility model from real user traces and [30] also explore a real user trace and analyze it. Model T [5] and T++ [6] are empirical registration models derived from the WLAN r egistration patterns of the mobile users. They are able to formulate the inter dependence of space and time explicitly by a set of few equations. [1] proposed a mobility model to capture time variant user mobility. In this model, they define communities that are visited often by the nodes to capture the skewed location visiting preferences, and use time peri ods with different mobility parameters to create the periodical re appearance of nodes at the same location. [9] also look into modeling generic WLAN users by identifying t he mobility characteristics of individual users. Studies done in [29] [32] shows that there exists repetitive behavioral trends in the association pattern of groups of users in large WLANs. In [7] researchers studied user mobility patterns and introduced metrics to model user mobility from a four week trace collected in a large corporate environment. T hey also analyzed user distribution and load distribution across access points [31] Most of these works are directly based on WLAN traces which can be found under the MobiLib project [13] or the CRAWDAD project [14] 19

PAGE 20

Researchers studied the changing usage of a mature campus wide WLAN by investigating the workload and usage of the Dartmouth WLAN trace for an 11 week period in 2001 and a 17 week period in 2003 2004 [8] They discovered that a mature WLAN (two years old at the time of invest igation) trace showed significant difference from the initial usage of the WLAN with new devices and applic ations such as streaming multimedia and P2P services. There is ment ion on mobility but the focus is on the difference in the usage of the WLAN. The study in [3] investigated several domain independent predictors for the location prediction on the WLAN trace, but did not define mobility characteristics or propose any techniques to construct the mobi lity model. Based on the result, they gave some suggestions for the usage of the predi ctor on WLAN traces. There are a number of user mobility prediction algorithms [10] [11] in the current literatur e that target cellular networks. These predictors ar e used in a different setting and for different purposes (i.e. paging scheme [10] efficient handoff [10] and resource reservation [11] The difference including, but not limited to, t he fact that a cellular device showing up in a cell that is a long distance away is very low, thus it is bounded location wise. There is also work done by [12] to improve caching paradigms that anal yze the wireless information locality and association patterns on a month l ong campus measurement trace. The characteristics and scale of the predicti ons mentioned in the above literature are different from what we are working on. They aim to improve the performance of wireless infrastructures by load balancing, admission control and resource reservation whereas we are investigating the behav ior of WLAN users and how this will affect the overall performance of our predictors. 20

PAGE 21

21 In our study, we use four predictors that have already been explored in existing literature to verify how the prediction accuracy changes due to changes in user mobility. The Order k Markov predictor and Lempel-Ziv (LZ) predictor that is used in our study is well-explored and widely used in various fields of study [3] [12] [15] [16] [17] [18] We discuss our data sets and metrics in Chapter 3.

PAGE 22

CHAPTER 3 DATA SETS AND METRICS 3.1. Data Sets For our investigations, we use the WL AN trace collected from the Dartmouth College campus and the University of Florid a campus. We make the assumption that each device belongs to one unique user, thus using the term user and device interchangeably when describi ng the unique MAC address we fi nd for each device. In our more recent studies, we find that with th e proliferation of sm art-phones, tablets and other secondary Wi-Fi enabled devices, the number of devices nearly double the actual population (Table 3-3 and Table 34). However we continue to use the term user and device interchangeably throughout this study for consistency reasons. While using this trace as our standard, general WL AN user base, we also extract other ultra-mobile user data sets from these traces which incl ude VoIP users (from Dartmouth College) and smart-phone and laptop users (from Univer sity of Florida) with more detailed descriptions of each of these data sets in the subsections 3.1.1 and 3.1.2. 3.1.1. Dartmouth College Trace In our first study (covered in Chapter 4) we use the three yearlong Dartmouth movement trace [14] collected from 2001 to 2004. T here are 13888 unique users and 623 different APs (access points) in this parti cular trace. The VoIP data set we use in this work is a subset of the Dartmouth WL AN trace described above and consists of 97 users. These are acquired by mapping t he whole WLAN trace with a MAC to device type map, which is a list of all the MAC addresses mapped with the type of device it is by looking at the first three octets of the MAC addresses. Among these 97 users we observe two types of VoIP devices which are the Cisco7920 and Vocera devices. We 22

PAGE 23

have particularly chosen VoIP devices to me asure the mobility of WLAN users since VoIP devices are always on and also online, un like other pocket PCs or PDAs that were on the market at the time (2001-2004), that may easily go into hibernate mode or may even be turned on and off frequently [8] Along with the VoIP data set we have generat ed ultra-mobile test sets from the same traces in order to validate our findings. There are three test sets used in this work and they are all considered to be ultra-mobile users [18] [19] The ap_200 and ap_170 sets are both based on the number of APs a user has visited. The ap_200 set is a collection of users who have visited 200 APs or more during the length of the trace and the ap_170 set is a collection of users who have visited more than 170 APs but less than 200. The range set is a collection of users who have covered the largest physical area during the length of the trace. This was done by studying the AP location file and calculating the area ra nge that each user has covered. Each of these test sets has approximately 100 users each. The following Tabl e 3-1 show the characteristics of the different data sets extracted from Dartmouth College that we used in this study at a glance. The ultra-mobile study in Chapter 4 led us to conclude that we are indeed in need of investigating how these changes in mobile devices will affect the characteristics over time, which propelled us to further investi gate the evolution of the WLAN users over time in Chapter 5. For our next part of the study (Chapter 5), we use the three yearlong movement trace collected from the Dartmouth College from 2001 to 2004 and the syslog trace that was collected from 2005 to 2006 [14] There are 13888 unique us ers and 623 different 23

PAGE 24

access points (APs) in the former trace and 24399 unique users and 1270 different APs in the latter trace. The drastic change of t he APs is due to a network infrastructure change that went on at Dartmouth during 2004 to 2005. They were in a transition period of going from Cisco APs to Aruba APs. While using this trace as our standard general WLAN user base, we also extract other ultramobile user data sets from this trace to broaden the spectrum of our study. We divide the traces into yearlong traces and the breakdown of the trace for each year is as follows; 2001-2004 is broken down from the first day of July to the last day of June of each year and 2005-2006 is from the first day of September to the last day of August. This a llows us to investigate the evolution of the characteristics of the users in the WLAN trace over t he time span of six years. As shown in Table 3-2, it is easy to see the rapid growth of the number of users over the years. This shows that the num ber of mobile devices have almost doubled each year and again highlights the importance of such a study in order to better understand the growing and evolving mobile user community. With the rapid propagation of such affordable devices, the mob ile user community is no longer a small subset of the society. This can be cat egorized as a characteri stic of the wireless network and its users which will continue to grow in the future [8] In order to minimize the effect that the difference in the number of users has on our studies, we have done some analysis on randomly sampled users as well as the entire year long traces. 3.1.2. University of Florida Trace The University of Florida (UF) trace [13] [26] has been collected from Fall 2007 through Spring of 2009 discontinued then resumed again starting Spring of 2011. This is a 3 year long trace collected over 5 years (a nd is still being collected as we speak). There was a network change that started dur ing fall of 2008 which caused some parts 24

PAGE 25

of the campus to disappear in the trace during fall of 2008 and Spring of 2009. There are some gaps present in the Spri ng 2009 trace and the trace collection was discontinued from summer of 2009 through fall of 2010. In our evolution study (Chapter 5) we use the whole UF trace divided into different time lengths such as yearlong traces and semester long traces. The characteristic of the users for these traces can be f ound in Table 3-3 and Table 3-4. The number of devices found in the traces has drastically changed over the years, and by looking at the numbers one can presum e that the most lik ely reason behind the drastic change in numbers are that the number of devices each individual carries around has changed. According to the factsheet fr om the University of Florida Office of Institutional Planning and Research, the approximate number of student, faculty and staff combined during these years were between 50,000 to 55,000 per semester1. Thus, we can assume each individual in 2011 is on average carrying 2 devices. This is not a far-fetched idea considering t hat smart phones have become much more affordable and wide spread, so a user will on average have a laptop and a smart phone (or other highly mobile i.e. tablets) device. In our framework (Chapter 6) we introduce two subsets of the UF trace, which is a data set comprised of smart phones and laptops. We have conducted a survey to collect the unique manufacturers first 3 octets of MAC addresses among smart phone users. Through the survey we were able to identify 30 unique octets (Organizationally Unique Identifier OUI) from the total of 95 responses received. These surveys were 1 http://www.ir.ufl.edu/ 25

PAGE 26

conducted from December 2011 to J anuary 2012. Table 35 shows the data characteristics of these two subsets of users from the UF Fall 2011 trace. In Chapter 4, we focus on the mobility and predictability of WLAN users. How does one measure mobility? How does one define success in a prediction model? In order to answer these questions, we will next discuss the metrics and approaches that are taken in order to be able to measure the mob ility and predictability of WLAN users. 3.2. Metrics 3.2.1. Mobility Metrics How do we measure mobility? Several metr ics can be used but it is unclear which of the metrics is best suited for our study. We discuss the perceived metrics we used to measure mobility in this sub section. 3.2.1.1. Distinct number of APs This is a metric used commonly in previous work such as [2] [7] This may seem intuitive however we have experimented in a systematic manner in order to verify that the number of unique access point s a user has visited can indeed be used as a method to measure mobility. We found that indeed a user who visits only a handful of APs shows less mobility while online than t hose who visit a large number of APs. 3.2.1.2. Prevalence Prevalence is a mobility metric proposed in [2] [6] [7] which indicates the time that a user spends at a given AP, as a fraction of th e total amount of time they spend on the network. Higher prevalence indicates that a us er has spent more time on a certain AP, and thus is deemed less mobile and lower prev alence means that a user has spent less time on a given AP and is described as more mobile. 26

PAGE 27

3.2.1.3. Activity range The activity range is another mobility metr ic we investigate and is defined as the smallest square area which covers all the acce ss points the user has visited in a single activity, where a single activity is denoted as the time when a user logs on to the network until the user logs out of the net work. Figure 3-1 shows an illustration of how the activity range is defined, where the circles implicate locations and the arrows indicate the users visitation path in a singl e activity. The area of the outer square which is the smallest square covering all the access points will be the activity range. 3.2.1.4. AP encounter ratio As we extended our study to different dat a sets in both time (expanded over 10 years) and space (Dartmouth College and UF) we realized that due to the dynamically changing network (network upgrades every 3 years on average with an astonishing addition of APs with every upgr ade) we needed a mobility metric that does not only rely on the absolute number of APs. The AP encounter ratio is a calculated metric of the ratio of distinct number of APs visited over the total number of encounters with APs. This metric shows a clear and consistent negat ive correlation (Figure 5-1 and Figure 5-2) between predictability and mobility and is also us ed as a means to define mobility in our study. 3.2.2. Predictability Metrics In our study, we used domain independent pr edictors to predict the location of a WLAN user. In subsections 3.2.2.1 and 3.2.2.2 we briefly discuss the prediction algorithms we used. 27

PAGE 28

3.2.2.1. Markov family of predictors We focus on the well-known Markov chain algorithm for our predictors in this study. Markov chain assumes that a locati on can be predicted from the current context which is the sequence of the k most recent symbols in t he location history. Depending on what k is, i.e. if we use 3 most recent symbols (locations) for the Markov chain predictor we call this predictor Markov Order 3 and denote it as Markov O(3) The Markov chain model represents each state as a context, and transitions represent the possible locations that follow t hat context. Thus, if we were to use the Markov O(3) predictor, we would look at the 3 most recent location s of the user and try to predict the next location that the user will visit. In our study, we use the Markov O(1), Markov O(2) and Markov O(3) predictors [3] [12] [17] [18] [19] Figure 3-3 shows an example of what the Markov O( 2) predictor does when given a user location history of ABCBCADBCA as illustrated in Figure 3-2. Based on the past 2 locations Markov O(2) builds its own graph and tries to predict where the user wil l move to next depending on the 2 most recent locations. 3.2.2.2. Lempel-Ziv algorithm The Lempel-Ziv algorithm is a famous dat a compression algorithm which we use in our prediction method so that it predict s in the case when the next symbol in the produced sequence is dependent on only its cu rrent state (but does not have to correspond to a string of fixed length). The length of the st ring may vary and is allowed to grow up to infinity. This is similar to Markov O(k) however, k is not fixed and may grow to infinity [3] [15] [17] Error! Reference source not found. [18] [19] [20] [23] Figure 34 shows an example of what the LZ predictor looks like as history builds. 28

PAGE 29

Table 3-1. Data Sets Extracted from Dartmouth Trace Year Labels (Characters) Number of Users 2001-2004 WLAN (all users in trace) 13439 VoIP (Voice over IP users in trace) 97 AP_200 (users visiting more than 200 Aps) 112 AP_170 (users visiting more than 170 Aps and less than 200 Aps) 127 range (users covering the largest physical area range) 113 Table 3-2. Number of APs and Users in each Yearlong Trace Year Number of APs Number of Users 2001-2002 516 2898 2002-2003 554 6370 2003-2004 572 11369 2005-2006 1270 24399 Table 3-3. Data Characteristics of Un iversity of Florida Year Long Data Sets Year Number of APs Number of Devices 2007-2008 812 49898 2008-2009 665 81678 2011 1868 158703 29

PAGE 30

30 Table 3-4. Data Characteristics of Univer sity of Florida Semester Long Data Sets Semester Number of APs Number of Devices Fa_07 643 49898 Sp_08 760 35600 Su_08 662 38711 Fa_08 585 54332 Sp_09 580 43410 Sp_11 1475 84995 Fa_11 1673 119272 Table 3-5. Data Characteristics of UF 2011 Fall Identified Smartphones and Laptops Semester Number of APs Number of Devices Device Type Number of Devices Fall 2011 1673 119272 Smart-phones 19300 Laptops 23929

PAGE 31

Figure 3-1. Illustration of t he definition of activity range, shown as the square simulation area representing the geograph ic extent of the wireless mobile network. Figure 3-2. Illustration of a us er location history of ABCBCADBCA 31

PAGE 32

Figure 3-3. Illustration of how the Markov O(2) will work giv en the user location history string of ABCBCADBCA Figure 3-4. Illustration of how the LZ predictor will work giv en the user location history of string ABCBCADBCA 32

PAGE 33

CHAPTER 4 WLAN USER MOBILITY AND PREDICTABILITY In this study, we compare the mobility characteristics of WLAN traces and ultramobile user traces from several different aspects.* We also look into several prediction techniques in order to study how the sharp contrast in mobility between ultra-mobile users and WLAN users affects different prot ocols that use these data sets such as location (i.e., next access point t he user will visit) prediction. The evaluation metrics for mobility in this study include prevalence, the number of distinct access points visited by a given user and the activity area range. We also investigate Markov O(1), Markov O(2), Markov O(3) and the Lempel-Ziv(LZ) algorithms as our prediction methods. The results of the mobility compar ison and the prediction algorithm on each of our set of traces are shown in the subsequent sections 4.1., and 4.2. 4.1. Mobility Comparison Prevalence is one of the mobility metrics proposed in [6] and defined in Chapter 3. Figure 4-1 and 4-2 show that VoIP users are more mobile than WLAN users, since the bars are lower, indicating that the users are spending less time at a given AP compared to the overall time that they spend online. Especially for the rightmost bar which indicates prevalence higher t han 0.95, the WLAN trace show s a much higher trend than the VoIP trace set. This can be interpreted as a larger portion of users in the WLAN trace has spent most of their time on onl y one AP compared to the users in the VoIP trace. This chapter adapted from [19][20][21][23]. For more detail on this work refer to [19][20][21][23]. 33

PAGE 34

The number of APs a user visits is a sec ond mobility metric we look into. A user who visits only a handful of APs show less mobility while on-line than those who visit a large number of APs. Figure 4-3 and 4-4 shows the distribution (CDF) of WLAN and VoIP users number of access points visited for each user. With the average number of APs a VoIP user visits being 146 compared to that of the WLAN user which was 36, we can see there is a huge difference in mobili ty. The median number of APs visited for WLAN users was 17 whereas for VoIP users it was 131. You can clearly see in Figure 4-3 that more than 70% of WLAN users a ccess less than 50 APs, whereas the VoIP users are more evenly distributed and 60% of the population accesses more than 100 APs as indicated in Figure 4-4. Figure 4-5 and 4-6 show the activity range distribution for WLAN and VoIP users. The percentage of VoIP users having a larger area of activity range is higher than that of the WLAN users. As indicated in Figur e 4-5, you can see t hat 90% of the user population in the WLAN trace stays inside a 1 square kilo meter area range whereas only a little more than 50% stays inside a 1 square kilo meter area for VoIP users as shown in Figure 4-6. 4.2. Predictability Comparison To study the effect of the sharp contrast in mobility and behavioral characteristics between VoIP and other WLAN users on networking protocols, we analyze a set of wellknown prediction algorithms explained in Chapt er 3 with the various sets of traces we have in our study, namely the subsets which constitute of users who have visited more than 200 APs (ap_200), more than 170 but less than 200 APs (ap_170) and visited an area that is larger than 1km2(range). 34

PAGE 35

We have run the Markov O(1), O(2) and O(3) predictors along with the LZ predictor [3][12] [17] [18] [19] [23] for each of the test sets we have, and also for the VoIP trace set and the whole body of the WLAN tr ace. We also compared the accuracy of all four predictors with the VoIP trace data to see which one has the best performance. Accuracy is measured as percentage of correct predictions of the next AP to visit. As shown in Figures 4-7 through 4-10, we can see that the WLAN trace always has the best prediction accuracy overall, fo r each of the different predictors, with an average of about 60% accuracy. The VoIP trac e, by contrast had the worst prediction accuracy for all the predictors with an av erage of approximately 25% accuracy. The ultra-mobile sets ap_170, range and ap_200 eac h had an average of 50%, 47%, and 40% prediction accuracy respectively. From Figur es 4-7 through 4-10, we can see that the best accuracy can be no more than 80% for Vo IP users, while there can be more than 95% accuracy for WLAN users. When we were first conducting our experimen t, we expected that the range of the physical area that each user covered would be a better criterion to measure mobility than the number of APs visited, since we cons ider a user to be more mobile when that user covers more ground. Hence, we ex pected that the range set would return bad prediction accuracy. Surprisingly, the range set always exhibits performance between the other two test sets (ap_200 and ap_170), which indicates that the users that covered larger areas physically, most lik ely have visited an average of 200 APs during their lifetime. To explain this result, intuitively the users that had visited less APs also had a better prediction rate than that of the user s who had visited more APs. The difference of 35

PAGE 36

the prediction accuracy between the two data sets (ap_170 and ap_200) is always around 10% near the median. As for the comparison of t he predictors on the VoIP data set and WLAN trace, as shown in Figures 4-11 and 4-12, the LZ predictor showed the worst prediction rate and the Markov O(2) showed the best prediction a ccuracy by a very minimal difference from the Markov O(1). Markov O(3) did not show a good prediction and these results indicate that a larger data structure and higher co mplexity does not help in making better predictions. However, the four predictors t hat are used in this study do not provide good prediction for the VoIP data set, although they are showing a very similar trend regardless of the mobi lity of the user. In Chapter 5 we discuss the evolution of WLAN mobile user characteristics over time, and how these changes affect user location predictability. 36

PAGE 37

Figure 4-1. WLAN user preval ence for Dartmouth College Trace Figure 4-2. VoIP device user pr evalence for Dartmouth College Trace 37

PAGE 38

Figure 4-3. Cumulative Probability of Uni que number of APs visited in Dartmouth College WLAN Trace Figure 4-4. Cumulative Probability of Unique number of APs visited in VoIP subset for Dartmouth College Trace 38

PAGE 39

Figure 4-5. WLAN Range Distribution for Dartmouth College Trace Figure 4-6. VoIP Range Distri bution for Dartmouth College Trace 39

PAGE 40

Figure 4-7. Cumulative Probability of User s Prediction Accuracy of Markov O(1) for Dartmouth College Trace Figure 4-8. Cumulative Probability of User s Prediction Accuracy of Markov O(2) for Dartmouth College Trace 40

PAGE 41

Figure 4-9. Cumulative Probability of User s Prediction Accuracy of Markov O(3) for Dartmouth College Trace Figure 4-10. Cumulative Probability of User s Prediction Accuracy of LZ for Dartmouth College Trace 41

PAGE 42

Figure 4-11. Comparison of Cu mulative Probability of User s Prediction Accuracy of all Predictors for Dartmouth College WLAN Trace Figure 4-12. Comparison of Cu mulative Probability of User s Prediction Accuracy of all Predictors for Dartmouth College VoIP Trace 42

PAGE 43

CHAPTER 5 EVOLUTION OF WLAN USER MO BILITY AND PREDICTABILITY Now that we have analyzed the different characteristics of ultra-mobile users and also studied the effect it has on prediction, the next question is, in which direction will user mobility evolve in the future?* What is the trend in user mobility and how in turn, will this affect the location pr ediction of these users? In C hapter 5, we investigate the above matters by first dividi ng our data set into each year long data set as shown in Table 3-2. Then we apply our mobility metri cs in order to extract the characteristics and trend over time. We continue to use the four predictors mentioned in Chapter 3 and 4 but with a focus on Markov O(2) since it has continuously proved to have the best prediction accuracy among the four. 5.1. Evolution of User Mobility In this part of the study, we use the dist inct number of APs a user has visited and also the AP encounter ratio as mentioned in Chapter 3, as the mo bility metrics. User mobility is difficult to capture with numbers, but in this section we investigate the correlation between our mobility metric and prediction accuracy to see how closely related they are. First, in Figure 5-1 and 5-2 we show the correlation between the AP encounter ratio and prediction accuracy of Markov O(2) which has been found to best predict WLAN users among all four predictors in the study we have done in Chapter 4. These two graphs were produced using a thousand randomly sampled users from the 02-03 and 03-04 trace [14] We have quantified the correlation coeffi cient between the AP encounter ratio and the Markov O(2) prediction accuracy in Table 5-1. As shown, there is a clear and This chapter is adapted from [16][17][18][16] For more detail on this work refer to [16][17][18]. 43

PAGE 44

consistent negative correlation between pred ictability and mobility. The correlation graph for the other years show similar patterns. In Table 5-2 we can see at a glance, how the mobility metrics evolve from year to year. The calculated number of users is the actual number of users that are considered in the calculation of the average and median access points and these are the users that have enough history accumulated (appeared at least 500 times in the trace) to run the predictors. We can easily see that the average number of di stinct APs as well as the median of distinct APs is growing, which m eans the users are showing higher mobility. However we must keep in mind that the total number of APs can change with network upgrades. This is also the case for the 05-06 Dartmouth trace where there was a drastic increase in the number of APs where it went from 572 to 1270 (Table 3-2). With a network change almost every 3 years, there was a need to introduce a new mobility metric i.e. the AP encounter ratio. In Figure 5-3, we can see that this coincides with our results for the lower prediction accuracy, as described in Chapter 4, that the more mobile a user is the less predictable they are. In section 5.2., in or der to compare the perfo rmance of different predictors we use the prediction accuracy metrics which define the percentage of correct prediction for each user and study the change in predictability over the years by looking at the evolutionary trend of the WLAN mobile users. 5.2. Evolution of User Predictability In Chapter 4 we have shown that predictability and mobility in WLAN users have a significant relationship. We showed that the more mobile a user is the less predictable and vice versa. Among the predictors we used, the predictor with the best performance 44

PAGE 45

was the Markov O(2) predictor. However, ev en Markov O(2) only h ad an approximate of 60% prediction accuracy which we did not find satisfactory. Table 5-3 shows a binning of users for each year long trace according to their prediction accuracy using Markov O(2) AP level prediction. It shows the average AP encounter ratio for each user category of prediction accuracy, where the 0~10% category is the group of users who have 10% or less prediction accuracy, 10~30% category users have more than 10% but less than 30% accuracy and so on. There were no users with prediction accuracy 10% or lo wer for the 01-02 and 02-03 trace. We find that the trend is the more predictable the user, the lower the average AP encounter ratio. Note that the average distinct number of APs visited is growing each year at a pace of approximately 3 APs every year, i ndicating that the users are indeed becoming more mobile over time. Figure 5-3 shows the Markov O(2) prediction accuracy for the Dartmouth trace divided in quarters. This shows a unique trend where some quarters in the same year show close coupling that is almost overl apping whereas they show 3%-5% drop in prediction accuracy for differ ent years. The prediction accuracy deteriorates as time passes. Each trace is approximately 10 weeks long and is probably the smallest time period of data we have investigated. We have filtered out data that had less than 500 prediction-worthy users, which included all the summer quarters along with 2001 quarters. In Chapter 6, Figure 6-18 shows that the Markov O(2) prediction accuracy for each yearlong Dartmouth trace also shows a similar trend in the prediction accuracy deteriorating over the years. There is a need to investigate why such a unique trend has appeared in this particular group of data and we will explore that in the future. 45

PAGE 46

We have studied and explored the evoluti on trend of the Dartmo uth College trace. Next we will investigate the UF trace. Bei ng a larger campus than Dartmouth, the trace collected at UF has many more APs and much more devices than that collected from Dartmouth. The difference in the traces in troduces a richness and diversity to the data while being collected from a University campus gives these two traces some similarities in characteristic. Figure 5-4 shows the evolut ion of UF traces split into semester long data sets. Again the consist ent degradation of t he prediction accuracy over time is present with the one excepti on of Spring 2009 almost overlapping with Fall 2008. Further investigation showed us this anomaly was due to a 6 week gap in the Spring 2009 trace. How will these traces that are split into semester long and quarter long time periods compare to the yearl ong WLAN trace? In Figure 5-5 we show the comparison of the UF semester long and yearl ong trace. You can easily see that the yearly trace has worse prediction accuracy in both 2007-2008 and 2008-2009 by approximately 8% to 20% depending on which semester you ar e comparing it with. This is a quantitative result of the intuitive assumption that users on campuses will be more predictable from semester to semester than year to y ear due to the aspects of students having different schedules for each semester. Some evolutionary studies can be seen in section 6.2.1 when we discuss the different granularity of location prediction ac curacy. The results shown in that study holds the trend shown in this section, fu rther strengthening our argument that WLAN mobile users are evolving towards a less predi ctable state. Thus motivates the need to revisit WLAN user location prediction. We al so find an interesting evolutionary trend that 46

PAGE 47

involves the more mobile and less mobile users (i.e. VoIP and smart phones vs. laptops and regular WLAN traces) in Figure 5-6. Al though we have validated in length that the users are becoming less predictable as time goes by, exactly one subset of users do not fit in this category and t hat is the smart-p hone trace. Compared to the VoIP trace (which is also ultra mobile and light we ight similar to smart-phones) the smart-phones show a better prediction accuracy of appr oximately 18% with th e evolutionary trend going the other way around. The predictability gap between the ultra mobile and less mobile users may be shrinking and we can hypothesize that this is due to smart-phones having more capabilities compared to the VoIP devices and that they may be in some ways replacing the role of what only laptops used to do i.e. checking email, surfing the web, etc.. This would be an interesting topic to look into in the future to see whether this will become a new trend that overwrites the current ones. We have done an extensive study using two large data sets collected over a ten year time period and have done evolution an alysis on the mobility and predictability of mobile users. We will discuss the insights we have gained in the various and systematic analyses we have completed and introduce a novel framework of multi-dimensional prediction process to better predict future WLAN mobile users in Chapter 6. 47

PAGE 48

Figure 5-1. 13 Correlation between the AP encounter ratio and prediction using Markov O(2) on 1000 randomly sampled 02-03 users Figure 5-2. Correlation between the AP encounter ratio and prediction using Markov O(2) on 1000 randomly sampled 03-04 users 48

PAGE 49

Table 5-1. Correlation coefficients for different metrics with Markov O(2) Trace Correlation with Markov O(2) Prediction Accuracy 01-02 02-03 03-04 05-06 District Number of AP -0,534 -0.504 -0.516 -0.559 AP Encounter Ratio -0.589 -0.594 -0.502 -0.570 Table 5-2. Time evolution on different char acteristics from yearly traces at a glance 01-02 02-03 03-04 05-06 Total Number of User s 2898 6370 11369 24479 Total Number of User s 834 2317 3765 2148 Average of distinct AP s 50.17 53.67 57.18 64.47 Median of distinct APs 41 44 48 59 Table 5-3. Evolution and correlation betwe en prediction accuracy of Markov O(2) Users binned according to prediction accuracy 01-02 02-03 03-04 05-06 0 ~ 10% 0 0 0.1325 0.0851 10% ~ 30% 0.1072 0.0998 0.1050 0.1028 30% ~ 50% 0.0609 0.0653 0.0552 0.0829 50% ~ 70% 0.0392 0.0337 0.0335 0.0505 70% ~ 90% 0.0249 0.0204 0.0221 0.0247 90% ~ 100% 0.0098 0.0084 0.0083 0.0079 49

PAGE 50

Figure 5-3. Cumulative Probability of Markov O(2) predic tion accuracy for Dartmouth Quarter data sets for data collected between 2002-2006 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1Probability of User having Prediction Accuracy < XPrediction Accuracy Markov O(2) sp02 fa02 wi03 sp03 fa03 wi04 sp04 fa05 wi06 sp06 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1Probability of User having Prediction Accuracy < XPrediction Accuracy of Markov O(2) fa07 sp08 su08 fa08 sp09Figure 5-4. Cumulative Probability of Markov O(2) prediction accu racy for UF semester data sets for data collected between 2007-2011 50

PAGE 51

51 Figure 5-5. Comparison of Cumulative Proba bility of Markov O(2) prediction accuracy between semester long and year long traces at UF for data collected between 2007-2009 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1Probability of User having Prediction Accuracy < XPrediction Accuracy of Markov O(2) fa07 sp08 su08 fa08 sp09 2007 2008 2008 2009 Figure 5-6. Comparison of Cumulative Pr obability of VoIP vs WLAN Users from Dartmouth trace 2001-2004 and also UF Fall 2011 smart-phones vs. laptops this shows that the gap between the tw o (ultra mobile and less mobile) have gone down from 33% difference to 7% difference 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1Probability of User having Prediction Accuracy < XPrediction Accuracy of Markov O(2) voip_01 04 smartphones_fa_11 all devices_fa_11 laptops_fa_11 wlan_01 04

PAGE 52

CHAPTER 6 FRAMEWORK DESIGN ON MULTI-DIME NSIONAL PREDICTION PROCESSES In Chapter 6 we shall introduce a new pr ocess to improve prediction for future mobile users, including smart-phone and highly mobile users. Looki ng at the analysis from Chapter 4 and 5 future WLAN traces are likely to have more highly mobile population introduced into them, thus, t he need to improve WLAN mobile user prediction will grow in the future. 6.1. Insights and Improvements With the ubiquity of wireless networks and the introduction of more capable mobile devices (PDAs, smart-phones, tabl ets, etc.), users are perceiv ed to be distinctly more mobile. These differences in mobility charac teristics (different degrees of mobility) significantly affect user loca tion predictability and will likely a ffect all protocols that utilize WLAN user traces [26] As seen in Chapter 5, predict ion algorithms showed poorer performance over time and as seen in Chapt er 4 more mobility perceived in users directly resulted in less predictability. With th is extensive study of investigating two data sets spanning 10 years, we can say with conf idence that more mobility is expected in the future. This means that we need to continuously monitor and analyze the WLAN trace and mobile users to understand such an intriguing evolving dynamic. The evolving dynamic include but are not limited to the changes in the underlying network, devices and user online behavior. Such insights urge us to open the door for revisiting and improving prediction for m odern day network users. We have already shown the need to seek for better prediction methods for the WLAN user trace. There are numerous avenues we can explore to try to improve user location prediction and are planning to explor e a select few of them. There are largely 52

PAGE 53

two ways to improve predictors without actually changing the algorithm; first, we can control the trace that is being fed into t he predictor, and second, we can control the prediction process itself. Using the first me thod, we can expand and explore different temporal dimensions by cutting the input trac e so that we predict weekdays only, certain days of the week or certain time of day, etc. or expand the different spatial dimensions such as calculating distance error between t he predicted location and actual location, investigate changes in predict ability when using different gr anularity of locations to predict (i.e. AP level vs. Building level) [16] [17] Error! Reference source not found. Using the second method we could put different weights on the decision tree branches using data such as duration that t he user accessed the location or implement a timer to put more weight on locations that were recently visited, and locations visited longer ago will decay and lose weight. Instead of a hit or a miss as a prediction success or fail, we could try to pr edict the gray area as well. We could weigh in how much confidence we have in predicting a certain location for a user i.e., user will go to location X with a 78% chance. Based on these insights we have expande d our study to attempt to improve prediction of WLAN mobile users by expandi ng the spatiotemporal dimensions such as time of day (temporal) and ca lculating distance error and different granularities of prediction (spatial) by introduc ing the novel framework of multi-dimensional prediction processes discussed in the following section. 6.2. Multi-Dimensional Pr ediction Process Framework In section 6.2. we introduce a way to bet ter predict future WLAN mobile users, namely the multi-dimensiona l prediction process framew ork. This framework was 53

PAGE 54

motivated strongly from the findings and insights we have gained from the extensive analysis done on the 10 year long WLAN trac es from collected from Dartmouth and UF. First we introduce the concept of the Oracle. The Oracle is the all-seeing-eye that is one step ahead and alr eady know what the mult i-dimensional prediction processes will predict, and thus will always choose the best prediction accuracy given all the choices of the multidimensional prediction processe s. The Oracle essentially chooses the predictor that will predict correctly given an instance of time, and if none of the prediction processes is a hit, the Oracle will choose the most favorable (i.e. closer distance, same building, etc.). Thus even for the same user the Oracle may switch between different prediction processes to tr y to provide the best possible prediction accuracy given all the different prediction processes to choose from. The multi-dimensional predi ction process is shown in Figure 6-1. The input is controlled such that the data it is fed is the entire trace, the sectional trace (looking at time blocks of the day) or in different granularity such as the AP level or building level. There may be numerous ways to control the input data as discussed in the section above but we inve stigate these three in our work. These inputs will go through different prediction algorithms and by look ing into the history of each of the user and trying to classify these users in meani ngful ways guidelines are provided that will act as a control element which will enable th is multi-dimensional prediction process to successfully choose the best predictor for any given user at any gi ven time. As for the output we look at different granularities ( AP level and building level) as well as the distance error in the case the predictor we have chosen makes the wrong prediction. 54

PAGE 55

6.2.1. Temporal Expansions Figures 6-2, 6-4, 6-6 and 6-8 show t he prediction accuracy comparison for the Oracle, whole trace and sectional trace for 1 00 randomly selected user s from 4 different traces collected at UF (Fall 2007, Spri ng 2008, Fall 2011 sm art-phones, Fall 2011 laptops) [24] The WHOLE trace is our base data which is the WLAN trace in its whole form and not expanded in anyway. The sect ional trace is a trace that has been expanded using a temporal exp ansion by looking at different time blocks of the day for each user. In this case we have looked at four 6 hour time blo cks from midnight-6am, 6am-12pm, 12pm-6pm and 6pm-midnight. This se ctional trace prediction is made by only looking into the same time block fo r users. The Oracle always picks the better prediction process of these different prediction processes and thus results in an envelope like fashion that show the best pr ediction among all the different prediction processes. Figures 6-3, 6-5, 6-7 and 69 show how much improvement the Oracle made over the whole or sectional traces. Ther e is a definite trend that we can see from these figures. The improvement rate over time (from 2007 to 2011) has risen drastically, where the maximum improvement for 2007-2008 dat a sets were 12-13% but that of the 2011 data sets increased nearly threefold resu lting in a 30-40% improvement for the sectional traces. Following this trend, we can conclude that the Oracle will become increasingly beneficial in the future as it will help improve the dropping prediction accuracy over time. 6.2.2. Spatial Expansions Here we discuss the spatial aspects of st ate expansion by showing the results for different granularities of predicti on and also of distance error rate. 55

PAGE 56

6.2.2.1. Different granul arities of prediction We explore different granularities of succe ss by looking at the AP level prediction as well as the building level prediction. In Figure 6-10 through 6-13 we compare the AP level prediction and the building level predicti on accuracy. One can clearly see that by using prediction on the building level it shows significant improvement over using the AP level prediction. Thus, expanding the spat ial dimension of our prediction process will gain up to a 20% increase (for 05-06) and up to 10% increase (for 01-02) in location prediction accuracy over the AP level prediction. We can also see that the improvement from the AP level prediction to the building level prediction is significantly bigger with time evolution. After providing the comparison of different levels of prediction granularity, we shall study how these different definitions affect th e time evolution of t he user predictability. First, we note that among the predictors we us e for each yearly trace, the Markov O(2) is clearly still the best predict ion algorithm for the traces in our study. One can easily see in Figures 6-14 through 6-17 that acro ss each yearly trace the Markov O(2) prediction proves to have the most accurate prediction result for the WLAN users. We noted in our experiments that with Markov O(1) prediction, less people may have lower prediction accuracies than those of Markov O(2) but this trend is quickly overturned once the predictor gains enough hi story. However, in order to have 3 previous steps followed by a certain step is harder to ac hieve thus, the Markov O(2) has the best tradeoff of the two and performs the best among t he three. This validates our previous findings of Markov O(2) predictor being the best among the ones explored in this study time and time again. 56

PAGE 57

In Figures 6-18 and 6-19 we show the time evolution of the yearly traces from 2001 to 2006 for the AP level predi ction and the Building leve l prediction respectively. We can easily spot that the building level pr ediction has smaller diffe rence of prediction accuracy between each yearly trace compared to the AP level prediction. Another interesting trait is that whereas the AP leve l prediction shows less predictability as time evolves. This, however, is not necessarily the case for the building level prediction. Although the building level prediction rate for 02-03 and 03-04 are almost identical, we can still see that at one point 03-04 actually has better predictability than 02-03. Also we can observe that 01-02 has less predictabi lity than both 02-03 and 03-04. Interestingly the 05-06 trace consistently shows the least predictability over both levels of prediction. 6.2.2.2. Distance error rate In 6.2.2.2, instead of predicting a hit or a miss we also look at how far off we missed when we did miss, by looking at t he distance error rate. The distance was calculated by mapping the building coordina tes to the APs belonging to each building and then calculating the distance between two buildings. For the purpose of differentiating an actual hit (distance 0) and a miss which occurred in the same building (distance 0) we give the miss that occurred in the same building a uniform distance of 10 meters since this is shorter than the s hortest distance between two buildings and not too short that it is an insigni ficant distance. We have looked at 4 different set of traces (UF Fall 2007, Spring 2008, Fall 2011 sm art-phones and laptops) and 50 randomly selected users for each of the su bset of the UF trace. Firs t we show the empirical CDF graph in Figure 6-20, 6-21, 6-22 and 6-23 with user error distance rate from 0 (a hit) to 500 meters. Figures 6-22 and 6-23 that loo ks at the laptop and sm art-phone subsets for 57

PAGE 58

the UF fall 2011 semester shows a highly dense CDF graph compared to Figures 6-20 and 6-21 which shows that of the fa ll 2007 and Spring 2008 UF trace. We zoom in on the Figures 6-21 through 6-23 by looking at those misses which occur outside of the same build ing. (distance of 15 to 500 meters) As shown in Figures 6-24 and 6-25 for the fall 2007 and Spring 2008 trace show clusters which means for the misses that happen most likely occurred for the same distances and shows these higher and sharper knees, whereas in Figures 6-26 and 6-27 the knees become lower and smaller creating almost a curve-like gr aph instead of a step-like one. This means whatever the misses may be, they are very evenly distributed and when a miss occurs it is not likely for the same distance. We can assume this comes from the changing dynamics of the WLAN network. The network itself by introducing a larger number of APs may be the cause or the devices being light-weight as they are, are tightly coupled with users [27] while online and allows the trace to c apture better mobility which leads to lower predictability and the evenly distributed distance error CDF graph. In Tables 6-1 through 6-3, we have calculated the mean, standard deviation, max and min of a subset of Fall 2007, Spring 2008, the laptop trac e and smart-phone trace. The mean and the standard deviation show a steady trend of the distance increasing in all the different conditions (i.e. regardless of including 0 distances or not, etc.). This is an interesting finding since the campus is going through net work changes and by adding more APs in a limited area, one would think t hat the distance of error for individual instances would go down. However as you can see in Tables 6-1 through 6-3, the distance error is continuously growing larger. Even within the sa me year of users, the Fall 2007 and the laptop trace shows more clusters than t hat of the Spring 2008 or smart-phone trace 58

PAGE 59

which are less predictable, there is a st rong correlation between the distribution of distances and prediction accu racy along with user mobility. 6.2.3. Guidelines Here we will discuss an initial set of guidelines we have discovered that will aid in improving the prediction accuracy for WLAN m obile users. Note that these guidelines may not directly apply to traces not used in this study. However, the framework itself and method is generic, and is applicable to gener ate new sets of guidelines as needed for other traces and data sets. Figure 6-28 and 6-29 each shows all the occurrences of the two processes that we use (sectional and whole) when for 100 random users selected from UF Fall 2007 and Spring 2008 re spectively, the predicted location is different from each other but one of them is a hit. In other words WHOLE predicts user A at instance t goes to location X next, and SEC TIONAL predicts user A at the instance t will go to location Y next. Only in the ca se the correct next location is among the prediction processes predicted location (i.e. X or Y is the correct nex t location) available will it appear in Figures 628 and 6-29. The x axis show s the amount of history accumulated at the time this instance happens and y axis is the AP encounter ratio (i.e. user mobility) at the time. Figures 6-28 and 6-29 definitely s how a trend that the more history is accumulated the smaller the AP encounter ratio is and with less history the AP encounter ratio can go up to nearly 1. It also shows that if the history is below 500 and AP encounter ratio is above 0.1 it is be tter to go with the SECTIONAL prediction process and if AP encounter ratio is below 0.02 it is better to pick WHOLE. With this guideline in place as the control for the fram ework illustrated in Figu re 6-1, feeding into the prediction algorithms we expect to achieve a better prediction accuracy compared to choosing one or the other prediction me thod. We provide pseudo code for these 59

PAGE 60

prediction guidelines in Fi gure 6-30. Poracle is our proposed prediction oracle implementation (to be validated a nd improved in the future work ). In the code it is the predicted next AP (or location) by our proposed oracle. 60

PAGE 61

Figure 6-1. Framework of multi-dimensional predicti on processes in a nutshell 61

PAGE 62

Figure 6-2. Comparison of the different prediction processes by controlling the input ordered by ascending order of predicti on using the Oracle for UF Fall 2007 0.4 0.5 0.6 0.7 0.8 0.9 1Prediction Accuracy for different prediction processes using Markov O(2)UF Fall 2007 oracle whole sectional Figure 6-3. Improvement of Oracle over UF Fall 2007 so rted by the improvement of sectional in ascending order 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14Prediction Improvement of OracleUF Fall 2007 improvement for oracle whole improvement for oracle sectional 62

PAGE 63

Figure 6-4. Comparison of the different prediction processes by controlling the input ordered by ascending order of prediction using the Oracle for UF Spring 2008 0.4 0.5 0.6 0.7 0.8 0.9 1Prediction Accuracy for different prediction processes using Markov O(2)UF Spring 2008 oracle whole sectional Figure 6-5. Improvement of Oracle over UF Spring 2008 so rted by the improvement of sectional in ascending order 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14Prediction Improvement of OracleUF Spring 2008 improvement for oracle whole improvement for oracle sectional 63

PAGE 64

Figure 6-6. Comparison of the different prediction processes by controlling the input ordered by ascending order of predicti on using the Oracle for UF Fall 2011 laptops 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Prediction Accuracy for different prediction processes using Markov O(2)UF Fall 2011 Laptops oracle whole sectional Figure 6-7. Improvement of Oracle over UF Fall 2011 laptop devices sorted by the improvement of secti onal in ascending order 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45Prediction Improvement of OracleUF Fall 2011 Laptops improvement for oracle whole improvement for oracle sectional 64

PAGE 65

Figure 6-8. Comparison of the different prediction processes by controlling the input ordered by ascending order of predicti on using the Oracle for UF Fall 2011 smart-phones 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1Prediction Accuracy for different prediction processes using Markov O(2)UF Fall 2011 Smartphones oracle whole sectional 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35Prediction Improvement of OracleUF Fall 2011 Smartphones improvement for oracle whole improvement for oracle sectional Figure 6-9. Improvement of Oracle over UF Fall 2011 smart-phone devices sorted by the improvement of sect ional in ascending order 65

PAGE 66

Figure 6-10 Comparison of Cumulative Proba bility of AP and building level Markov O(2) prediction for 01-02 users Figure 6-11. Comparison of Cumulative Pr obability of AP and building level Markov O(2) prediction for 02-03 users 66

PAGE 67

Figure 6-12. Comparison of Cumulative Pr obability of AP and building level Markov O(2) prediction for 03-04 users Figure 6-13. Comparison of Cumulative Pr obability of AP and building level Markov O(2) prediction for 05-06 users 67

PAGE 68

Figure 6-14. Cumulative Probability of Prediction Accuracy for 01 02 User Trace Figure 6-15. Cumulative Probability of Prediction Accuracy for 02-03 User Trace 68

PAGE 69

Figure 6-16. Cumulative Probability of Prediction Accuracy for 03-04 User Trace Figure 6-17. Cumulative Probability of Prediction Accuracy for 05-06 User Trace 69

PAGE 70

Figure 6-18. Cumulative Probability of Mark ov O(2) AP Level Prediction for each Yearly Trace at Dartmouth College Figure 6-19. Cumulative Probability of Mark ov O(2) Building Level Prediction for each Yearly Trace 70

PAGE 71

Figure 6-20. Empirical CDF showing the distance error rate in meters for 50 random users from UF Fall 2007 trace (distance error rate of 0 500 meters shown) Figure 6-21. Empirical CDF showing the distance error rate in meters for 50 random users from UF Spring 2008 trace (distance error rate of 0 500 meters shown) 71

PAGE 72

Figure 6-22. Empirical CDF showing the distance error rate in meters (from 0-500) for 50 random users from UF Fa ll 2011 smart-phone traces Figure 6-23. Empirical CDF showing the distance error rate in meters (from 0-500) for 50 random users from UF Fall 2011 laptop traces 72

PAGE 73

Figure 6-24. Empirical CDF showing the distance error rate in meters for 50 random users from UF Fall 2007 trace (zoom in and show only distance error rate of 15 500 meters) Figure 6-25. Empirical CDF showing the distance error rate in meters for 50 random users from UF Spring 2008 trace (zoom in and show only distance error rate of 15 500 meters) 73

PAGE 74

Figure 6-26. Empirical CDF showing the distance error rate in meters for 50 random users from UF Fall 2011 smart-phone trace (zoom in and show only distance error rate of 15 500 meters) Figure 6-27. Empirical CDF showing the distance error rate in meters for 50 random users from UF Fall 2011 laptop trace (zoom in and show only distance error rate of 15 500 meters) 74

PAGE 75

Table 6-1. First order statistics on the distance error rate. Including when distance is 0 (which is a hit, Distance >= 0) All Distance (including hit) Fall 2007 Spring 2008 Laptop Smartphone Mean 7.09 17.15 23.98 40.69 Standard Deviation 47. 87 95.78 121.35 177.12 Max 2694.09 3241.48 3221.35 6346.67 Min 0 0 0 0 Table 6-2. First order statistics on the distance error rate. Not including when distance is 0 (Distance > 0) All Misses Fall 2007 Spring 2008 Laptop Smartphone Mean 53.84 116.3 108.94 146.37 Standard Deviation 122.02 225.12 240.08 312.07 Max 2694.09 3241.48 3221.35 6346.67 Min 10 10 10 10 Table 6-3. First order statistics on the distance error rate. Not including when it is a miss within the same building (Distance > 10) All Misses outside of buildings Fall 2007 Spring 2008 Laptop Smartphone Mean 177.35 203.52 288.63 290.13 Standard Deviation 190.19 274.56 335.06 399.73 Max 2694.09 3241.48 3221.35 6346.67 Min 20.68 20.06 18.34 18.34 75

PAGE 76

Figure 6-28. Correlation between the accumula ted history and the AP encounter ratio of 100 random users fr om UF Fall 2007 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 010002000300040005000AP encounter Ratio History of the UF Fall 2007 choosing SECTIONAL over WHOLE choosing WHOLE over SECTIONAL Figure 6.29. Correlation between the accumu lated history and the AP encounter ratio of 100 random users from UF Spring 2008 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 010002000300040005000AP encounter RatioHistory of the UF Spring 2008 choosing SECTIONAL over WHOLE choosing WHOLE over SECTIONAL 76

PAGE 77

77 Figure 6-30. Pseudo code for prediction guidelines derived from the results above

PAGE 78

CHAPTER 7 CONCLUSION AND FUTURE WORK The dynamically changing WLAN trace char acteristics, with the introduction of lighter more mobile devices with increased capabilities in communica tion, computation, storage and sensing, changes in the network as well as the user behavior is affecting the protocols using these traces. We show that the change in the traces and mobility indeed affect protocols such as user locati on prediction by doing an extensive study and analysis on user mobility, predi ctability and evolution of 10 y ears worth of real life trace collected from two major campuses. We lear ned that users are becoming more mobile, and less predictable as time goes by. Based on these findings we built a multi-dimensional prediction process framework that will allow us to better predict the fu ture WLAN mobile users who are deemed to becoming less and less predictable. By intr oducing multiple dimensions we provide different prediction processes to this fr amework that will allow anyone using this framework to customize the pr ediction process for users in order to achieve the best prediction accuracy possible based on all the information it has. For our future work we plan to ext end our framework and also collaborate with another colleague to predict users w eb visitation using NetFlow traces [33] [33] This will be an exciting area to explore especially not onl y because there are many challenges in dealing with such a large data (each days wort h of data can be as large as few hundred gigabytes) but also that this can be used to create better predicted caching such as some applications are already putting to use such as [25] [28] 78

PAGE 79

79 As mentioned numerous times throughout this study, there is a most definite need to revisit predictors and any other protocols t hat use these user mobility traces, which are using changing dynamically due to the ev er evolving nature of these traces.

PAGE 80

LIST OF REFERENCES [1] W. Hsu, T. Sp yropoulos, K. Psouni s, and A. Helmy, Modeling Time-variant User Mobility in Wireless Mobile Networks, in Proceedings of IEEE Conference on Computer Communications, May 2007, Anchorage, Alaska [2] C. Tuduce and T. Gross, A mobility model based on WLAN traces and its validation, in Proceedings of INFOCOM 2005, Miami, FL, USA 664-674 [3] L. Song, D. Kotz, R. Jain, and X. He, Evaluating location predictors with extensive Wi-Fi mobility data, in Proceedings of IEEE INFOCOM 2004, Hong Kong, China [4] M. Kim, D. Kotz, and S. Kim, Extracting a mobility model from real user traces, in Proceedings of IEEE IN FOCOM 2006, Barcelona, Spain [5] R. Jain, D. Lelescu, and M. Balakrishnan, Model T: an empirical model for user registration patterns in a campus wireless LAN, in Proceedings of the 11th annual international confer ence on Mobile computing and networking, August 28September 02, 2005, Cologne, Germany [6] D. Lelescu, U. Kozat, R. Jain, and M. Balakrishnan, Model T++:: an empirical joint space-time registration model, in Proceedings of the seventh ACM international symposium on Mobile ad hoc networking and computing, May 22-25, 2006, Florence, Italy [7] M. Balazinska and P. Castro, Characterizing Mobility and Network Usage in a Corporate Wireless Local-Area Network, in Proceedings of International Conference on Mobile Systems, App lications, and Services, May 2003 [8] T. Henderson, D. Kotz, and I. Abyzov, The changing usage of a maturecampuswide wireless network, in Proceedings of the MOBICOM 2004: 187-201 [9] W. Hsu and A. Helmy, On Modeling User Associations in Wireless LAN Traces on University Campuses , in Proceedings of the Sec ond International Workshop on Wireless Network Measurement (WiNMee 2006), Boston MA, Apr. 2006 [10] H. Zang and J. C. Bolot, Mining Call and Mobility Data to Improve Paging Efficiency in Cellular Networks, in Proceedings of the MOBICOM 2007: 123134 [11] J. Chan and A. Seneviratne, A Practical User Mobility Prediction Algorithm for Supporting Adaptive QoS in Wireless Networks, in Proceedings of IEEE Intermational Conference on Networks, Sep. 1999, Brisbane [12] F. Chinchilla, M. Lindsey, and M Papadopouli, Analysis of Wireless Information locality and association patterns in a campus, in Proceedings of the IEEE INFOCOM 2004, Hongkong, China 80

PAGE 81

[13] http://nile.cise.ufl.edu/MobiLib/ [14] http://crawdad.cs.dartmouth.edu/ [15] A. Bhattacharya and S. K. Das, Le-Zi Update: An Information Theoretic Approach to Track Mobile Users in PCS Networks, in Proceedings of the MobiCom 1999 [16] J. Kim and A. Helmy, The Evolution of WLAN User Mobility and its Effect on Prediction, in Proceedings of ACM IWCM C International conference on Wireless Communications and Mobile Computing), pp. 226-231, Istanbul, Turkey, Jul. 2011. [17] J. Kim and A. Helmy Analyzing the Mobility, Predi ctability and Evolution of WLAN Users, Intl Journal of Autonomous and Adaptive Communications Systems (IJAACS), Vol. 7, Nos. 1/2, 2014. (In press) [18] J. Kim and A. Helmy, Analyzing Mobility Evolution in WLAN Users: How predictable are we?, ACM Mobile Computer and Communications Rev (MC2R), Vol. 14, No. 3, pp. 10-12, July 2010. [19] J. Kim and A. Helmy, The Challenges of Accurate Mobility Prediction for Ultra Mobile Users, Eurosis Middle Eastern Simu lation and Modelling Conference, appeared and published in the European Si mulation and Modelling Conference (ESM), Leicester, Unit ed Kingdom, Oct. 2009 [20] J. Kim and A. Helmy, The Challenges of Accurate Mobility Prediction for Ultra Mobile Users, a poster presentation at ACM M obiCom, San Francisco, CA, Sep. 2008 (published in ACM Mobile Computing and Communications Review MC2R Journal) [21] J. Kim and A. Helmy, The Challenges of Accurate Mobility Prediction for Ultra Mobile Users, ACM Mobile Computer and Communications Rev (MC2R), V. 13, Issue 3, pp. 58-61, Jul. 2009. [22] S. Moon, U. Kumar, J. Kim, W. Hsu, and A. Helmy, Visualization and representation of mobile network users, demo presentation at IEEE SECON, San Francisco, CA, Jun. 2008 [23] J. Kim, Y. Du, M. Chen, and A. Helmy, Comparing Mobility and Predictability of VoIP and WLAN Traces, CRAWDAD Workshop, Mont real QC Canada, Sep. 2007, held in conjunction with ACM MobiCom. [24] U. Kumar, J. Kim, and A. Helmy, Changing Patterns of Mobile Network (WLAN) Usage: Smart-phones vs. Laptops, To appear at Mobile Computing Symposium, Sardinia Italy, Jul. 2013 81

PAGE 82

[25] Paul Mobile App, http://www.inmobly.com/index.php/products/paul-theapp.html [26] W. Hsu, D. Dutta, and A. Helmy, Trace: Structural Analysis of User Association Patterns in University Campus Wireless Lans, IEEE Transactions on Mobile Computing, Nov 2012 [27] A. Oulasvirta, T. Ratte nbury, L. Ma, and E. Raita, Habits Make Smartphone Use More Pervasive, Personal and Ubiquitous Co mputing, vol. 16, pp. 105 114, 2012 [28] I. Papapanagiotou, E. M. Nahum, and V. Pappas, Smartphones vs. Laptops: Comparing Web Browsing Behavior and the Implications for Caching, SIGMETRICS Perform. Eval. Rev. [29] W. Hsu, D. Dutta, and A. Helmy, CSI: A Paradigm for Behavior-oriented ProfileCast Services in Mobile Networks, Ad Hoc Networks, In Press, 2011 [30] D. Tang, M. Baker, Analysis of Local-Area Wireless Network, in Proceedings of ACM MobiCom, pp. 1-10 [31] K. Lee, S. Hong, S. J. Kim, I. Rhee, and S. Chong, SLAW: A New Mobility Model for Human Walks, in Proceedings of the IEEE INFOCOM, Rio de Janeiro, Brazil, 2009 [32] W. Hsu, D. Dutta, and A. Helmy, Mining behaviora l groups in large wireless LANs, in Proceedings of the 13th M obiCom 2007, Montreal, Canada, pp. 338 341 [33] S. Moghaddam and A. Helmy, "Multidimensional Modeling and Analysis of Wireless Users Online Activity and Mobilit y: A Neural-networks Map Approach," ACM MSWiM Nov. 2011 82

PAGE 83

83 BIOGRAPHICAL SKETCH Jeeyoung Kim received her Ph. D. from the Computer and Information Science and Engineering department at the University of Florida in Gainesville, Florida in the summer of 2013. She was a research assistant of the Mobile Wireless Networks Design and Testing Group (NOMADS) laboratory under the advisement of Prof. Ahmed Helmy. She received her Master of Science in Engineering from the Computer and Information Science department at the University of P ennsylvania in Philadelphia, Pennsylvania and her Bachelor of Science and Engineering degree from the Computer Science department at Kyungpook National University in Daegu, Republic of Korea. Her research interests lie in wireless network m obile user online behavior of real-life WLAN traces, analyzing the mobility an d predictability of WLAN users.