MassMine: collecting and archiving big data for social media humanities research University of Florida
Grant proposal
Beveridge, Aaron
de Farber, Bess
Dobrin, Sidney
Gitzendanner, Matthew
Taylor, Laurie N.
Van Horn, Nicholas
George A. Smathers Libraries, University of Florida
Gainesville, FL
The MassMine project team representing participants from the Department of English, George A. Smathers Libraries (Libraries), and Research Computing at the University of Florida (UF) request $60,000 to finish the version 1.0 release, establish a robust training program, and promote the MassMine open source software. MassMine enables researchers to collect their own social media data archives and supports data mining, thus providing free access to “big data” for academic inquiry. MassMine further supports researchers in creating and defining methods and measures for analyzing cultural and localized trends, and developing humanities research questions and data mining practices. The primary aims of this project are to: 1) refine the MassMine tools to support collection, acquisition, and use of available social media and web data; and, 2) develop a training program with online resources for supporting the broad use of MassMine by humanities researchers, regardless of experience.
MassMine Software by Nicholas M. Van Horn and Aaron Beveridge released under GNU GENERAL PUBLIC LICENSE, Version 3, 29 June 2007, see:

University of Florida
University of Florida
This item is licensed with the Creative Commons Attribution License. This license lets others distribute, remix, tweak, and build upon this work, even commercially, as long as they credit the author for the original creation.
TABLE OF CONTENTS List of Participants ....................................................................................1 Abstract and Statements of Innovation and Humanities Significance...........................................................2 Narrative Enhancing the Humanities through Innovation ............ .......................3 Environmental Scan .................................................................5 History and Duration of the Project .6 Work Plan Staff .. .7 Final Product and Dissemination .....8 Project Budget Budget Form .... 9 Budget Narrative ...... ..11 Federally Negotiated IDC Rate Agreement ............12 Biographies ... ...18 Data Management Plan .. ..21 Letters of Commitment 23 Letters of Support ....27 Appendices .. ..32 Detailed Work Plan .....32 Selected References and Resources.... 34 Humanities Research Questions Needing MassMines Data Collection, Curation, and Analysis Functionalities ..... ........... 35 Workshop Handout: Using MassMine on University of Floridas Research Computing Cloud Server ..... 39


Participants Alteri, Suzan (University of Florida) Beveridge, Aaron (University of Florida) Clapp, Melissa (University of Florida) Dobrin, Sidney I. (University of Florida) Freeman, Richard (University of Florida) Gitzendanner, Matthew (University of Florida) Hart -Davidson, William (Michigan State University) Kidd, Kenneth (University of Florida) Martin, Cathlena ( University of Montevallo) Morey, Sean ( Clemson University ) Rice, Jeff ( University of Kentucky) Taylor, Laurie N. (University of Florida) Van Horn, Nicholas (Ohio State University)


MassMine: collecting and archiving big data for social media humanities researchers University of Florida Abstract and Statements of Innovation and Humanities Significance Abstract The MassMine project team representing participants from the Department of English, George A. Smathers Libraries (Libraries), and Research Computing at the University of Florida (UF) request $60,000 to finish the version 1.0 release, establish a robust training program, and promote the MassMine open source software. MassMine enables researchers to collect their own social media data archives and supports data mining, thus providing free access to big data for academic inquiry. MassMine further supports researchers in creating and defining methods and measures for analyzing cultural and localized trends, and developing humanities research questions and data mining practices. The primary aims of this project are to: 1) refine the MassMine tools to support collection, acquisition, and use of available social media and web data; and, 2) develop a training program with online resources for supporting the broad use of MassMine by humanities researchers, regardless of experience. Statement of Innovation Humanities researchers currently lack sufficient access to social media data, tools for data mining, and tools f or processing data for analysis. MassMine is open source software in development to address these concerns specifically by humanists for the needs of humanists by providing a set of easy to use tools for creating social media data archives, querying and mining the archives, and revealing the processes and technologies for enabling generation of new methods and new questions. State ment of Humanities Significance MassMines version 1.0 release will enable new approaches to small and big data for humanists by creating access to data with tools for data mining, processing, and analysis. This project will result in a powerful data tool for humanists with a simple GUI interface. Using an iterative development process in collaboration with humanities scholars, project results include training resources and tool documentation for MassMine for increasing capacity for data-intensive research in the humanities.


MassMine: collecting and archiving big data for social media humanities researchers University of Florida Enhancing the Humanities through Innovation The MassMine project team representing participants from the Department of English George A. Smathers Libraries (Libraries) and Research Computing at the University of Florida (UF) request $60,000 to finish the version 1.0 release, develop a robust training program, and promote the MassMine open source software. MassMine enables researchers to collect their own social media data archives and supports data mining, thus providing free access to big data for a cademic inquiry. MassMine further supports researchers in creating and defining methods and measures for analyzing cultural and localized trends, and developing humanities research questions and data mining practices. The primary aims of this project are t o: 1) refine th e MassMine tools to support collection, acquisition, and use of available social media and web data; and, 2) develop a training program and corresponding online resources for supporting the broad use of MassMine by humanities researchers, regardless of experience. The MassMine project is a Level II start -up grant proposal to develop MassMine for broader scale implementation to support humanities research needs for social media data col lection, mining, and analysis. The MassMine developers have recogni zed and are responding to the importance of access for humanities data research. Humanities researchers currently lack sufficient access to social media data, further entrenching a digital divide .1 Hu manities researchers must innov ate more accessible t ools for data mining, processing, and analysis. A s Adam J Banks explains, technological access i s a socio -cultural concern. Banks defines technological access as material functional, meaningful, and transformational. Material or physical access becomes gatekeeper because access to it is a prerequisite to the basic knowledge that is required to utilize technology .2 B ecause of licensing limitations and advanced technical skill demands, humanists have been restricted from meaningful and transformative research using social media and web data. MassMine s initial and ongoing development seeks to address specific concerns of access and use within humanities research practices. Currently, MassMine is a console application acce ssed using a command line interface (Apple, Linux, Windows) with which users input a basic configuration file that controls the data collection and data processing functionality Users can run the MassMine console on standalone computers for individual res earch or on cloudstyle servers. MassMine is operating successfully on UF Research Computing s servers, with MassMine accessible by UF researchers and collaborators at other institutions. MassMine version 0.1.0 code is available as open source on GitHub. The MassMine Startup project will provide a s e t of easy to -use tools and a training program for humanists to create soci al media data archives, query and mine the archives, and engage with processes and te chnologies for generating new methods and questions.3 The MassMine Startup project seeks funding for a software programmer, cloud server hosting, and t raining program design to: 1) develop a GUI ; 2) build the Export & Processing Module; and, 3) implement a full training program for humanities researchers using MassMine to conduct data research. The MassMine GUI w ill utilize the same underlying console engine ensuring parallel capabilities for console or GUI version s. Currently, MassMine uses data frames to stor e information and supports data collection from the Twitter and Google APIs as well as sets of user -supplied URLsFacebook and Wikipedia APIs will be added as data sources by January 2015. MassMine currently s tores raw social media data and web data that has useful additional information attached (e.g., timestamps, geolocation data). E xtraneous data is also attached (e.g., HTML, markup, punctuation, irrelevant URLs, nonspecific attached data) and this unrelate d data can impede optimal analysis. The planned Export & Processing Module will add support for storing data in SQL and MongoDB databa se formats, exporti ng data to additional formats ( CSV TXT, and XLSX ) and functionality for data curation (e.g., review ing, cleaning and subsetting dat a) .4 Data curation is a critical 1 B oyd, D & Crawford, K. Critical Questions for Big Data. Information, Communication, and Society, 2012. 2 Banks A Race, Rhetoric, and Technology: Searching for Higher Ground. Urbana, IL: NCTE, 2006. 3 Friedlander, A. Asking Questions and Building a Research Agenda for Digital Scholarship, Washington, D.C.: Council on Library and Information Resources, 2009. 4 Ogburn, J. The Imperative for Data Curation. portal: Libraries and the Academy, 10(2), 2010.


MassMine: collecting and archiving big data for social media humanities researchers University of Florida part of the data research process and can consume the majority of time on any given research project.5 With the technical enhancements, MassMine will be a comprehensive tool for collecting, processing, and exporting data to enable greater access for humanists in developing data research questions, undertaking data research, colle cting data, analyzing data and informing broader interdisciplinary dataintensive methodologies initiated by the humanities. Social media postings are significant resou rces for humanities scholarship, comprised of content text of posting s with valuable information attached geolocation information and access dates support analysis of real world locations where texts have connected and moved, including tracking circulation S ocial media postings often encompass forms beyond textual data, including videos, static and animated images, and memes that combine text and image elements. Humanistic modes of inquiry can be productively brought to bear on these materials, and should rely on textual practices and methods to inform the implement at ion of humanities data intensive research Despite the value of social media postings, humanities research opportunities are current ly limited because social media postings are often controlled by user licensing or service agreements, which restrict access H umanities researchers face three connected problems when attempting to study social media and web data: 1) high cost of access to data resellers who package and charge for licensed data from social media postings; 2) purchased data access is not for raw data ; and, 3) p urchased data is most often pre filtered, already -visualized, and made available through limited browser tools focus ed on marketing and brand management needs. These problems can be avoided by accessing data through APIs (application programing interfaces) but a ccessing and collecting data via APIs, especially for systematic data processing and exporting, requires coding knowledge. Data obtained through API s generally remain s under licensing or user agreement restrictions that limit display, sharing, and certain types of data usage. Various tools support parts of th is data research process; however, tools with integrated support for data collection, query development, and data mining pr ocesses are unavailable. Whereas the sciences must consider data practices in terms of the sharing and provenance of data sets that scientists have already acquired ,6 the humanities requir e a more comprehensive approach .7 The MassMine Startup project will support researchers as they creat e their own data archives in a manner that complies with permissible uses of APIs MassMine is designed to leverage existing APIs for large -frame support. Combined with the development of a training program researchers will have access to data research tutorials including training on conforming to acceptable uses of data providers as well as ethical considerations regarding privacy practices for social media and web research MassMines Project Team uses open and collaborative development models, which recognize humanities scholars as core users who are integral to ongoing design and expansion. The Project Team uses a grounded approach for development, where the Project Team engages users to identify needs, conducts iterative development following the needs and concerns expressed by humanities researchers. Because humanists have had such limited access to data and data tools, training is an essential pa r t of the proposed project. The project will include a comprehensive training program that begin s with creating a Scholars Group of humanities scholars who will initial ly use the console application, with Project Team support as they develop research questions, collect data, analyze data, and inform broader data research practices. The Scholars Group will serve as a core cohort of humanities researchers who will respond to feature and GUI development through their feedback during and following training sessions with the Project Team. In the process of supporting MassMines development, the Scholars Group will pursue their individual research with the complementary goal of producing publishable data analyses and visualizations. Fol lowing the release of the GUI version, the training program will expand by offering open sessions focused on different interests and needs, including teaching data research in the classroom, training for humanists about data research, and focused training for researchers in Ch ildrens Literature. 5 Steve Lohr For Big Data Scientists, Janitor Work is Key Hurdle to Insights. New York Times: Aug. 17, 2014. 6 Szalay, A. & Gray, J. The World Wide Telescope. Science, Sept. 2001. 7 American Council of Learned Societies. "Our Cultural Commonwealth: The Report of the American Council of Learned Societies Commission on Cyberinfrastructure for the Humanitie s and Social Sciences." NY: ACLS 2006.


MassMine: collecting and archiving big data for social media humanities researchers University of Florida Research about Childrens Literature and popular culture provide abundant examples of research questions related t o books, films, toys, games, and related user communities and activities. Many of these have widespr ead community responses to authors and works The varied training sessions will be opportunities for conducting outreach and gathering user feedback to inform MassMines development and training offerings The GUI releas e, added functionality, and training program will position MassMine to provide a scalable foundation for humanities scholars interested in employing social media and web data for research. MassMine 1.0 will support many scholarly approaches ( see examples in appendix ) including Childrens Literature and transmedia approaches. For example, Mary Roca is interested in studying how Mattels pro ducts and interactive materials o ffer narrative content and directions for play, while also functioning as scripts for consumers using social media She explains that Mattel attempts to control its brand while mining its fans for new products and content. By using MassMine 1.0 Roca will be able to study how Mattel employs a franchise management style to promote its narratives, by i nvestigating Monster High and Ever After High for how Mattels social media activity works in combination with the related consumer products provides data on how branded fiction promotes a consu mptionbased American girlhood. For another example, r esearchers studying p articular countries and events w ill be able to use MassMine as a tool to collect and analyze events associated with particular locations. For example, Petrine Archer and Claudia Hucke created the online exhibition About Face: Revisiting Jamaicas First Exhibition in Europe to celebrate Jamaicas 50th Anniversary of Independence by revisiting Face of Jamaica the countrys first post-independence exhibition to tour Europe. Face of Jamaica toured Europe in 19631964 yet was never viewed in Jamaica. About Face reconsidered the original exhibition by representing its art and related materials online. Using MassMine, scholars will be able to collect social media texts on current reception of artists and artworks featured in this nation-defining exhibit to inform research on particular artists, culture, and nationalism. In addition to supporting the research contained in the scholarly contributions, MassMine version 1.0 will enable data research in the classroom with minimal time and technology requirements. Committed to building a strong foundation of data literacy in the humanities, MassMine will benefit te aching by enabling humanities courses to include data research without programming or coding skills by providing an easy to use interface Rather than spending too much time collecting and archiving data, c lasses will be able to focus on framing re search questions and analyses. MassMine is designed to operate longterm for large data collection projects, but it is also effective for short term projects within a single semester thus introducing students in the humanities to dataintensive methodologies. Environmental Scan The Digital Research Tools (DiRT) Directory contains many entries on tools for data collection and analysis. However, available tools do not adequately respond to the comprehensive need for tools that assist in data collection, archiving, querying, and analyzing for general humanities scholars. P rogramming or coding skills are often mandatory, along with funds for purchasing data and many tools are designed to support data research without considering the limited technical support provided to humanities scholars. T ools like TAGS: Twitter Archiving Google Spreadsheet and programming guides like the Programming Historian support users with technical skills for data collection and research Commercial data collection and analysis providers (e.g., Radian 6 by SalesForce, GNIP by Twitter, Topsy by Apple, SumAll ) offer free trial periods followed by expensive payment plans with certain limits and data presets that focus on business analytics and marketing research which cannot be used to study many of the diverse humanities researchers questions. Building from the humanities long history of textual analysis recent innovative work includes text analysis by data mining large research archives. D ata mining to ols for already collected data or pre existing archives (e.g., WordSeer MALLET Hathi Trust Research Center ) often allow users to upload large corpora for indexing, text parsing, topic modeling, information retrieval, and machine learning. Tools for specific needs include tools that support necessary data curation proces sing (e.g., Open Refine ), analysis and visualization of network structures (nodes, edges, connections) (e.g., NodeXL ) and certain levels of data access and analysis and for specific data sources


MassMine: collecting and archiving big data for social media humanities researchers University of Florida (e.g., Webometric Analyst Mozdeh fo r Twitter ). R esources also include trainings as with the Digital Humanities Data Curation Institute s (2013-2014) MassMines version 1.0 release will mark a significant contribution by providing improved access multiple data sources, by eliminating the need for advanced technical skills in data collection and processing, and by providing a robust training program specifically developed to support humanities research. History and Durati on of the Project In response to data access problems, development on MassMine began in response to needs unmet by available tools Development began in late 2013 to support research about complex circulation networks8 by investigating th e concept of hypercirculation 9 using Twitter to study the relationship of trends to respective location s and content The ubiquitous R language was ch osen as the coding environment for its wel lsupported programming package for accessing the Twitter API for Twitter data on trends to approach theoretical problems posed by theories of hypercirculation. Since t he Twitter API provides limited historical data more data was needed for circulation s tudies informing requirements for MassMine. MassMine was coded to access the API at the maximum allowable bandwidth arc hive new data as it was available, and continue to collect systematically over long periods of time to allow scholars to build large re search data archives with minimal technical demands MassMines current 0.1.0 release supports the Twitter API, Google API, and aggregate sets of web page URLs provided by the user. The console application does not require programming knowledge for users to collect data. The application is controlled with a configuration file that is basic text so users can edit with any text editor. To use the MassMine console application, users open the command line console, run a simple command to start the R programming language, and then submit the command (MassMine) to start the software. 10 The console interface allows users to save their API usernames and passwords for quick restarting of the software, and gives t he option to edit simple text -based configuration files that direct the software for the kinds of data to collect and duration. MassMines 0.10 supports installation on both local computers and hosted servers (e.g., UF Research Computing) with the same functionality available for both. MassMine was designed to operate properly with minimal system and bandwidth demands for accessing pulling, archiving, and processing data. MassMine 0.10 runs efficiently on older computers with minimal requirements for local users and ensure s greater ease for support by central service providers. On servers, MassMine can run as a single or m ultiple parallel instances with MassMine hosting by UF Research Computing starting in 2014. This project grant period ( 5/1/ 2015-4/30/2016) will build on existing support. UF Research Computing s support for UF researchers extends to their collaborators and will support the training program for this project. MassMine was also d esigned for lightweight and small storage demands even with big data research. I nitial collection of trend data from ac ross the US encompassed more than 1.14 million lines of data when pulling data at the maximum bandwidth allowed by Twitters API, which resulted i n a small data set (slightly over 120MBs). B ecause of inherent redundancy of social media and web application data, once compressed the data was less than 6MBs. Work Plan S pecific tasks to be completed during the grant period as detailed in the appendix, are : 1) develop a GUI that creates and edits the configuration file; 2) build the Export & Processing Module for storing data in different formats; and 3) create a training program with in -person and online trainings; providing resources for: install ing and using MassMine, developing questions for data research, and broader resources for humanities researchers in doing data research; 4) test and review features and functions of MassMine with the Scholars Group of humanities researchers ; and 5) r eleas e MassMine 1.0 The Project Team will meet monthly to review progress on goals and project timeline milestones. Work plan f easibility is evident from planned activities and assigned responsibilities for Project T eam 8 Tay lor M. The Moment of Complexity: Emerging Network Culture. Chicago: U of Chicago P 2001. 9 Dobrin S Postcomposition. Carbondale: SIU Press, 2011. 10 See appendix with guide to running MassMine with textual descriptions and screen shots.


MassMine: collecting and archiving big data for social media humanities researchers University of Florida members for all work elements. The Project Team has explicit and reasonable goals f or the life of the gr ant, team members with appropriate skills and successful collaborative project experiences, and technical and institutional support to achieve project goals. The Project Team s diverse members bring unique skills and perspectives along with support and stakeholder commitments from their institutional area s. The proposed project results and products will be: 1) Graphical U ser I nterface (GUI), complementing the console interface, with both interfaces served by the same engine, and with parallel functionality accessible through both 2) Export & Processing Module: additional data storag e options (MongoDB, SQL) and module for processing (data curation with MassMine storing complete raw data which inclu des useful and extraneous data att ached) and exporting data (CSV, TXT, XLSX) for import into various software as needed for the research goals (e.g., IBMs SPSS, M S Excel WordSeer MALLET ) 3) Full Training P rogram: Planned training sessions include inperson trainings at UF a nd Clemson and online webinars, as well as guides, documentation, and resources for enabling data research (e.g., how to build data collection s process an d export the d ata for analysis) All training sessions will include discussion of critical and ethical considerations for data research with socio technical concerns related to protection of human subjects for data privacy and IRBs. P lanned training sessions will include: Installing MassMine on Servers: Training for infrastructure providers on MassMine server installation for indivi duals and groups of researchers will include technical aspects and discussion on central service provider needs to support MassMine and data research Installing and Using MassMine for Humanities Data Research: Session One: Installing and using MassMine on local computers Session Two: Using MassMine hosted by UF Research Computing Session Three: Developing research questions, project scope, and goals for using MassMine (individually and collaboratively); training will include software and methodological assistance, discussion of data acquisition strategies for statist ical needs, and intellectual goals MassMine for Humanities Data Research : Session without installation; will cover using MassMine on UF Research Computing servers, developing research questions, project scope, and goals for using MassMine individually, review of software and methodological concerns including data acquisition s trategies for statistic al needs and intellectual goals, and other supports MassMine for Teaching Data Research in the Humanities: Will build from prior session providing an overview on and considerations for teaching with MassMine in the classroom MassMine for Data Research for Childrens Literature: Version of training for Data Research in the Humanities will focus on Childrens Literature for gathering feedback on supporting a specific area for focused research needs and as these needs inform larger concerns Advanced Data Training Session(s): To be planned in consultation with Scholars Group. Possible t opics : methods for inspecting and querying collected data, tools and procedures for exploratory data analysis, hypothesis-driven descriptive and inferentia l statistical investigations Staff An Advisory Board comprised of Humanities Scholars from several institu tions will contribute expert guidance on all MassMine project activities. Contributed Cost Share: Project Participants from the University of Florida Sidney I. Dobrin PhD, Project Director, Research Foundation Professor of English, UF (.20 FTE cost share, totaling $24,037). Project Role: Dobrin will guide the overall project, support communication among Scholars Group and Advisory Board members, develop partnerships with other universities for central service su pport, and collaborate on outreach to s pecific research communities in the humanities.


MassMine: collecting and archiving big data for social media humanities researchers University of Florida Laurie N. Taylor, PhD, Project Co Director, Digital Scholarship Librarian, UF (.10 FTE cost share, totaling $9,034 ). Project Role: Taylor will guide the project and focus on the training program for trainings and materials Matthew Gitzendanner, PhD, Project Co Director, Biological Data Scientist, Research Computing Training Coordinator, UF (.06 FTE cost share, totaling $5,825). Project Role: Gitzendanner will guide the project, support training for central service providers and the Scholars Group using MassMine hosted by UF Research Computing, and provide expertise on developing software for research. Suzan Alteri, MLIS, Curator of the Baldwin Library of Historical Childrens Literature, UF (.05 FTE cost share, totaling $3,368). Project Role: Alteri will liaise with Childrens Literature scholars for supporting MassMine training, research question development, reference consultations, and collaboration with other institutions. Melissa Clapp MLIS, Humanities Librarian & Digital Humanities Library Group (DHLG) Scholars Studio Coordinator, UF (.05 FTE cost share, totaling $4,013). Project Role: Clapp will liaise with the DHLG to support MassMine training activities, reference consultations, and collaborations with humani ties librarians at other institutions and for ongoing humanities data research support. Richard Freeman, PhD, Anthropology Subject Specialist Librarian & Digital Humanities Library Group (DHLG) Member (.05 FTE cost share, totali ng $3,687). Project Role: Freeman will collaborate with the other project team members to provide and coordinate the Digital Humanities Library Groups support for training and outreach activities, including liaising support with scholars in the social sciences and humanities. NEH Funding Request: Project Participants Aaron Beveridge, PhD Student & MassMine Developer, UF ($12.12/hour for 809 hours, totaling $10,189). Project Role: Beveridge, as the MassMine Project Assistant, will coordinate communication, schedule trainings, gather feedback, and serve as the Scholars Groups primary contact for developing research questions, using MassMine on UF Research Computings servers, and testing the Export & Processing Module. Nicholas Van Horn, PhD Student & MassMine Developer, Ohio State University (1 FTE grant funded, totaling $33,303). Project Role: Van Horn as the MassMine project programmer, will be responsible for the Export & Processing Module, GUI, and adding any required data sources, featu res, or functions informed by the Scholars Group, and Advisory Board. Van Horn will also be responsible for managing version control, debugging, and updates through the GitHub system. Final Products and Dissemination Final products include relea sing all code as open source on GitHub for the MassMine GUI and Export & Processing Module as well as the full training program and outreach resources. Dissemination will include dissemination through the training program and by project participants. Dissemination will also include press releases on the tool and research projects using MassMine, email announcements to scholarly lists, trainings at THATCamps in 2015, Florida Digital Humanities Consortium events, and others. The Project Team will seek for MassMine to be included in courses at UF (e.g., Data Literacy Common Core course, all UF undergraduates), Clemson University and other institutions. Possible funding sources for subsequent phases include NEHs Digital Humanities Implementation Grants The Social Media Research Foundation and others MassMine code is open source, so any researcher or de veloper can access the code and submit revisions. Members of the Project Team plan to continue contributing code for research and teaching needs.


click for Budget Instructions Computational Details/Notes (notes)Year 1(notes)Year 2(notes)Year 3 Project Total 05/01/201504/30/2016 01/01/20__12/31/20__ 01/01/20__12/31/20__ 1. Salaries & Wages (1) Nicholas Van Horn; Postdoctoral Associate Temporary Software Developer 100%$29,446% %$29,446 (1) Aaron Beveridge; OPSLibrary Outreach/Training $12.12/hr X 31 hrs X 26.1 pay periods 100%$9,807% %$9,807 % % %$0 % % %$0 % % %$0 % % %$0 2. Fringe Benefits 13.10% $29,446 base $3,857 $3,857 3.90% $9,807 base $383 $383 3. Consultant Fees $0 4. Travel $0 $0 5. Supplies & Materials $0 6. Services Budget FormOMB No 3136-0134 Expires 7/31/2015 Applicant Institution: Project Director: Project Grant Period: University of Florida Sidney Dobrin 05/01/2015 through 04/30/2016


UF Research Computing Matching Program $3,200 $3,200 7. Other Costs $0 8. Total Direct Costs Per Year $46,693 $0 $0 $46,693 9. Total Indirect Costs DHHS on 06/28/13 28.5% Per Year $13,307 $0 $0 $13,307 10. Total Project Costs $60,000 11. Project Funding a. Requested from NEH $60,000 $0 $60,000 b. Cost Sharing $64,203 $0 $0 $0 $64,203 12. Total Project Funding$124,203 Project Income: Applicant's Contributions: Outright: Federal Matching Funds: TOTAL REQUESTED FROM NEH: Third-Party Contributions: Other Federal Agencies: TOTAL COST SHARING: (Direct and Indirect costs for entire project)


MassMine: collecting and archiving big data for social media humanities researchers University of Florida Budget Narrative Salary & Wages plus Fringe (UF) NEH Request ($ 46,693) The project team plans to hire the two original MassMine creators: Nicholas Van Horn, Data Scientist and MassMine Developer, as a Postdoctoral Software Developer (1 FTE totals $3 3,303) to develop the Export & Processing Module, graphical user interface (GUI), and other enhancements and supports as identified by the Scholars Group, Advisory Board members, and other scholars in the humanities who will be providing feedback; and, Aaron Beveridge, PhD Student in English, as MassMine Project Assistant ( 809 hours at $ 12.12/hour, totals $ 10,189 ) to coordinate communication, schedule trainings, and serve as the Scholars Groups primary contact. Services UF Research Computing matching program ($3,200). Salary & Wages plus Fringe (UF) Contributed Cost Share ($ 49,963) Cost share will be provided by Department of English key participants as follows: UF Research Foundation Professor, Sidney Dobrin ( Project Director) (.20 FTE totals $ 24,037 ) will lead and guide the overall project, and pursue new partnerships and collaborations with universities and research communities in the humanities to promote and disseminate MassMine. Cost share will be provided by Smathers Libraries key participan ts as follows: Digital Scholarship Librarian, Laurie Taylor ( Project Co Director ) (.10 FTE totals $9,034 ) will collaboratively guide the overall project, focusing on implementing the full training program with all collateral resource development. M elissa Clapp (.05 FTE totals $4,013) will provide and coordinate the Digital Humanities Library Groups support for training and out reach activities. Suzan Alteri (.05 FTE totals $3,368 ) will provide and coordinate support for training and outreach activities wi th Childrens Literature scholars. Richard Freeman (.05 FTE totals $3,687 ) will collaborate with the other project team members to provide and coordinate the Digital Humanities Library Groups support for training and outreach activities, including liaisin g support with scholars in the social sciences and humanities. Cost share will be provided by Research Computing key participants as follows: Data Scientist & Research Computing Training Coordinator, Matthew Gitzendanner (Co PI) (.06 FTE cost share, tota ls $5,825) will support users for MassMine hosted by UF Research Computing. Indirect Costs (UF) NEH Request ($13,307) NEH funding is requested for IDC rate of 28.5% for UF as follows: $46,693 in base direct costs for the single project year. Indirect Costs (UF) Contributed Cost Share ($14 ,240) This represents IDC for UFs contributed cost share of $49 ,963.


MassMine: collecting and archiving big data for social media humanities researchers University of Florida Biographies Advisory Board An Advisory Board will contribute, at no cost to the project, expert guidance on the MassMine project activities including software, functionality, training program, documentation, and related products. The Advisory Board will be comprised of Humanities Scholars from several institutions who are investigating data resear ch supports in the humanities. Advisors include: Kenneth Kidd, PhD, English and Center for Childrens Literature & Culture, UF; Cathena Martin, PhD, Game Studies & Design, University of Montevallo; and Sean Morey, PhD, English, Clemson University Project Role: Advisors will evaluate all technical features, GUI design, and the training program in terms of how all elements together support humanists in developing research questions and practices for data research The Advisory Board's role is to: 1) provide guidance in framing humanities research questions and practices needing data research support; 2) provide expert perspectives about system functionality, interface design, and the MassMine training program ; 3) recommend and select Scholars Group participants (Advisory Board m embers may also elect to serve); and 4) promote MassMine to other scholars in their research communities. NEH Funding Request: Project Participants Nicholas Van Horn, PhD Student & MassMine Developer, Ohio State University is the co creator of MassMine along with Aaron Beveridge. He is an active researcher in the interdisci plinary field of computational cognitive neuroscience. Work in this area represents the convergence of advances in a number of related fields, including neuroscience, psychology and psychophysics, computer vision, artificial intelligence, mathematical mode ling, as well computer science more broadly construed. His work on visual perception, learning, and memory has emphasized the collaborative nature of the field, resulting in many multi-author peer-reviewed publications in high-impact journals and talks/pos ters at top conferences. He has won multiple research awards for his work, and his mathematical and computer science expertise has enabled him to design and program many strictly controlled experimental protocols, including a project using the functional m agnetic resonance imaging (fMRI) scanner at OSU to study patterns of activity in the brain during analogical reasoning. Further, his focus on computational modeling led to the development of several computer models of human memory, as well as a largescale investigation of visual object recognition that compared performance of a biologically -inspired computer algorithm to the results of human performance from five behavioral studies. The work required many thousands of lines of custom software and was deplo yed on the Linux cluster at the Ohio Supercomputer Center, the results of which led to a published journal article and talk at the Vision Sciences Society, the premier vision conference in North America. In parallel with this research, he actively writes and maintains open source software related to writing and productivity, statistical analysis, and other core software functionality such as tools for automated text processing. This work in part led to the development of the current functionality of MassMine, for which he remains the lead software developer. Aaron Beveridge, PhD Student, MassMine Developer, English Department, UF, is the co -creator of MassMine along with Nicholas Van Horn. His research investigates the intersection of data science and humanities research paradigms -focusing on the importance of tool creation and software development as they motivate the ongoing expansion of available research methodologies in the digital humanities. With an emphasis on writing studies and circulation studies, his current project tests theories of hypercirculation as they attempt to explain the delivery and recirculation of digital media within complex networks. He presented his work with MassMine at the largest international conference for writing studies, rhetoric and composition, the Conference on College Composition and Communication (CCCC 2014), and he has been accepted to present an update of the software along with new data analysis at the same conference in 2015. He is currently working with the George A. Smathers libraries at UF to develop trainings for the Scott Nygren Scholars Studio (a Digital Humanities lab) to teach Arduino


MassMine: collecting and archiving big data for social media humanities researchers University of Florida microcontroller programming/prototyping, and an additional set of training s that teach text mining and natural language processing. Contributed Cost Share: Project Participants from the University of Florida Sidney I. Dobrin, PhD, Project Director, Research Foundation Professor of English currently serves as Graduate Coordinator for the Department and for ten years served as Director of Writing Programs in the English Department. Dobrin is the Founding Director of Trace Innovations Initiative, an online hub for research in media ecology, technology, and writing. Dobrin has published seventeen books about writing, technology, ecology, and media. He continues to publish in these research areas and anticipates the release of three new books this year with others to follow. His 2011 book Postcomposition received the W. Ross Winterowd Award for best book published in composition theory. Dobrin is frequently an invited, keynote, and plenary speaker at conferences and universities, both internationally and domestically. Laurie N. Taylor, PhD, Project Co Director, D igital Scholarship Librarian, conducts research to create scholarly cyberinfrastructure through data/digital curation, digital scholarship, while developing socio -technical supports (people, policies, technologies, communities) to create sustain, and inte grate digital scholarship and data curation across communities, and fostering an environment of radical collaboration made possible in the digital age or the age of Big Data. She works heavily with the Digital Library of th e Caribbean (dLOC, serving as technical director for this international collaborative), UF Digital Collections, UF Research Computing, and the SobekCM Open Source software community. She has been the principal investigator, co -PI, and investigator on many grants, including co-principal investigator on the ARL PD Bank a digital scholarship project t o centrally collect academic library job position descriptions for immediate and longterm analysis, and planning to meet needs related to future changes in academic libraries in the digital age. Her teaching and training spans undergraduate Digital Humanities courses, graduate writing courses, and workshops on digital technologies. She has published refereed articles on data curation, digital scholarship, collaborative international digital libraries, library and information science, digital media, open ac cess, and literature; and she co -edited a collection on digital representations of history and memory, Playing the Past: Video Games, History, and Memory. Matthew Gitzendanner, PhD, Biological Data Scientist, Research Computing Training Coordinator, coordinates the Research Computing training program, provides expert support as a Bioinformatics Specialist for Research Computing users, conducts research as a research faculty member in the Biology Department, teaches computational biology courses for undergr aduate and graduate students and develops software for scholarly research. His research spans a broad array of topics generally related to evolutionary genomics, with topics ranging from population and conservation genetics to genomics and bioinformatics. Suzan A. Alteri, MLIS, Curator of the Baldwin Library of Historical Childrens Literature, UF, conducts research on the materiality of the book, special collections in the classroom, and research on historical religious tracts. She works with the newly created Baldwin Library Scholars Council to determine grant proposals, and publication opportunities for both graduate students and researchers working with books from the Baldwin Library. She was the principal investigator on the grant Forging a Collaborative Structure for Sustaining Scholarly Access to the Baldwin Library and is a project team member on the Developing Librarian: Digital Humanities Pilot Training Program She has presented on The Little Golden Books, Digital Curation, Introduction to the Baldwin Library Digital Collection, Digital Collections and Foundations, and on Digital Collections and Scholarship. In addition, Suzan has cura ted the se exhibits: Bigger, Better, Best: the Panama Canal in Childrens Literature When Phantasie Takes Flight: the Art & Imagination of Arthur Rackham and Grimm Changes on the work of the Brothers Grimm over time. Her most recent publication, The Classroom as Salon: a Collaborative


MassMine: collecting and archiving big data for social media humanities researchers University of Florida Project on Daniel Defoes Robinson Crusoe appeared in Digital Defoe: Studies in Defoe & His Contemporaries She regularly liaises with the Department of English on campus, which includes the Childrens Center for Literature and Culture. Melissa Clapp, MLIS, Humanities Librarian & Digital Humanities Library Group (DHLG) Scholars Studio Coordinator is the Instruction & Outreach Librarian for the Humanities & Social Sciences. She joined the faculty of UF in 2007. Her research interests include digital humanities, research practices of students, and learning methods. She holds a Master of Information Studies degree from Florida State University and MA in English from Northern Illinois University. Richard Freeman, PhD, Anthropology Subject Specialist Librarian & Digital Humanities Library Group (DHLG) Member, is presently working on two digital projects with UF faculty members. One project is working with a body of historical photographs of the construction of the transcontinental railroad T he second is creating new visual content for the digital collection entitled: Vodou Archive housed within the Digital Library of the Caribbean (dLOC) in the UF Digital Collections. Freeman also worked as an archivist at the National Gallery of Art in Washington D.C. and as an assistant professor of anthropology. He has ma de numerous presentations at conferences and has several publications on the culture of Argentine politics and visual anthropology. He is currently working on a paper about the digital photographs project and a chapter for an edited volume on Haitian Vodou ceremonies. He is also an active member of the DHLG and is presenting on building support for the digital humanities in libraries at the 2014 conference for the Florida Chapter of the Association of College & Research Libraries (FACRL).


MassMine: collecting and archiving big data for social media humanities researchers University of Florida Data Management Plan MassMine is being developed specifically to support data research needs in the humanities. This includes the ability to access and engage with all levels of tools and data research. Open source code is essential to support external review for reproducible research, support ongoing open development to support data research in the humanities, and enable and foster collaboration among humanists for data research. In developing MassMine, one of the Project Team goals is for MassMine to exemplify open and collaborative approaches for software development and training in the process of improving access to data research. The same overall alignment will be used in making all technical decisions including those related to the GUI for MassMine. Like MassMines other components, the GUI will be based on open standards and compliant code to support use on any operating system with an open standards compliant web browser (Windows, Apple, Linux). MassMine code is already publicly available through G itHub and will be released to GitHub on an ongoing basis. MassMine is released under the GPL license as open source for download by anyone. Using GitHub others can also fork a copy of the code. Forking is the term for creating a new version of the software where developers can continue development on a separate trajectory to submit new changes and additions to the software. Versioning and debugging will be controlled through GitHubs update/submissions system and new changes will be developed and released through that same system under the supervision of the MassMine team at UF. For success, all materials for the project need to be shared openly and as widely as possible. T he investigators commit to openly sharing all data in a timely manner. The proposed MassMine project focuses on software development and training which do not involve any private or otherwise restricted data, and do not involve any data that would present a risk to disclosure. The team does not anticipate any privacy issues, ethical issues, or intellectual property issues. Because MassMine enables other research projects, for the research data collections created by MassMine which could potentially have privacy and other concerns, the proje ct training program will explicitly include data privacy and IRB approvals as supporting resources for data research. In addition to MassMine code on GitHub through regular releases, each major release version of the code also will be archived to the broa dly accessible IR@UF Project documentation, tutorials, and training materials will be hosted in the IR@UF. Materials will include documentation, project examples, sample data sets, guides on additional resource articles and related open source analysis software, etc. The Smathers Libraries at UF commit to archiving and making materials accessible on an ongoing basis and at project end. This is in keeping with normal practices of the Libraries commitment to open and expedient dissemination of grant products and grant materials (e.g., "Unearthing St. Augustine" grant materials ) to support research needs and to assist in building a cult ure of grantsmanship. UF dedicates staff time to digital preservation and access from the Digital Production Services staff, IR@UF Manager, Digital Development & Web Services Team, Digital Librarian, and others. The project will gener ate a variety of data materials, with the majority being code, training resources, and documentation. Specific forms include: whitepaper, planning materials, reports, webinar videos, training materials, and meeting notes. Programming for MassMine was developed in and will continue to use the R language as the underlying technology for MassMine. R is an open source language, as are all of the development environments for coding in R, so technical resources for the programming and software development of MassMine are freely avail able and well supported. Documentation will be embedded in source code, in separate ASCII files (e.g., plain ASCII, Asciidoc, HTML, XML) and/or in formatted f iles (e.g., PDF, DOCX, PPTX). Training and support materials will be stored in standard formats (e.g, HTML, PDF, AVI, PPTX, etc.). Researcher datasets and accompanying files will be made available in their original and normalized formats ( brief list of selected, recommended formats ).


MassMine: collecting and archiving big data for social media humanities researchers University of Florida The Project Team will use GitHub for sharing code and code documentation, with all data openly ava ilable for anyone through GitHub. For permanent and findable support, all grant data materials will be openly accessible and preserved in the IR@UF powered by the SobekCM software, which provides metadata for all materials (at the item, group, and aggregation levels), permanent identifiers and URLs, multiple file formats and digital object packages (preservation and access copies), and more. All materials for this project will be openly accessible and will be made available as soon as possible, with the supporting metadata for findability and usability, with all project data made available at minimum twice each year and the majority of the project data made available in regular releases each week o r more frequently. The Libraries are committed to long term digital preservation of all materials in the UF Digital Collections (UFDC), including the IR@UF, and in UF-supported collaborative projects as with the Digita l Library of the Caribbean (dLOC). Redundant digital archives, adherence to proven standards, and rigorous quality control methods protect digital objects. Through UFDC, the Libraries provide a comprehensive approach to digital preservation, including technical support, reference services for both online and offline archived files, and support services by providing training and consultation for digitization standards and longterm digital preservation. The Libraries support locally created digital resources as powered by and hosted with the SobekCM Open Source Repository Software including the UFDC which contains over 381,000 digital objects with over 30 million files (as of Fe bruary 2014). The Libraries create METS/MODS metadata for all materials. Citation information for each digital object also is automatically transformed by the SobekCM software into MARCXML and Dublin Core. These records are widely distributed through library networks and through search engine optimization to ensure broad public access to all online materials. In practice consistent for all digital projects and materials supported by the Libraries, redundant co pies are maintained for all online and offline files. The digital archive is maintained as the Florida Digital Archive (FDA) which was completed in 2005 and is available at no cost to Floridas public university libraries. The software programmed to support the FDA is modeled on the widely accepted Open Archival Information System. It is a dark archive and supports the preservation functions of format normalization, mass format migration and migration on request. As items are processed into the UFDC for public access, a command in the METS header directs a copy of the files to the FDA. The process of forwarding original files to the FDA is the key component in UFs plan to store, maintain and protect electronic da ta for the long term. If items are not directed to load for public access, they do not load online and are inst ead loaded directly to the FDA ( more information ).


College of Liberal Arts & Sciences 4008 Turlington Hall Department of English PO Box 117310 Gainesville, FL 32611 3523926650 3523920860 Fax September 1, 2014 Dr Sidney I. Dobrin Department of English CAMPUS Dear Sid, I am writing to confirm my commitment and participation as a member of the Advisory Board Team for your proposed project "MassMine: Collecting and Archiving Big Data for Social Media Humanities Researchers." This is an exciting project with great potential significance for interdisc iplinary research, especially as the humanities embrace empirical and quantitative methods of research and knowledge production. I am a scholar of children's literature, with particular interests in canon and field construction and histories of the children's archive, so this project is of particular interest to me. This semester, for example, I am teaching a graduate seminar on the children's literature archive, drawing on our preeminent Baldwin Library of Historical Children's Literature. In that class we are reading Moretti's Graphs, Maps, and Trees an exploration of quantitative research for the humanities, and students will be conducting various experiments in data mining and analysis, albeit on a smaller scale. The opportunities for research when it comes to current children's literature and children's media are also exciting, especially since so far the conversation about children's media has been dominated by researchers outside the humanities. Children's literature scholars are just now beginning to turn to social research methods, and the MassMine project could greatly enhance the collective sense of possibilities. Childhood studies, moreover, is on the rise as an interdisciplinary field, and the MassMine project can both draw from and extend that field's range and import. My sense is that the MassMine project has the potential to transform not merely how we conduct our work but also our understanding of what that work actually is, or could be. I look forward to working more closely with you as this project develops. Sincerely, Kenneth Kidd Professor and Chair The Foundation for The Gator Nation An Equal Opportunity Institution


September 9, 2014 Dear Dr. Sid Dobrin, I am writing to confirm my commitment and participation as a member of the Advisory Board Team for your proposed project on MassMine: Collecting and Archiving Big Data for Social Media Humanities Researchers. As a tenuretrack assistant professor of Game Studies and Design, my teaching and research areas include a variety of game categories, including board, card, video, and tabletop roleplaying games. My gaming emphasis c omes out of a larger study of childrens literature and culture. Additionally, I am also the Director of the Honors Program at the University of Montevallo and support digital humanities projects with my Honors faculty, and have worked with our QEP librarian to incorporate information literacy into Honors courses. I support MassMine for my research and for the good of my colleagues. MassMine will help provide faculty with an easy interface with which to do social media data mining. At a small, public liberal arts university such as mine, with no computer science program, we need access to training and user friendly platforms that aid our online research This need is true across COPLAC institutions and beyond. My current research is on tabletop role playing games, and I could use MassMine to assess the sociological impact of these types of games within the United States and determine how the narrative revolving around these types of games have shifted since the late 70s in the public perception. But this software can support a large variety of game focused projects, such as has already been demonstrated with Kyle Bohunickys Game Studies and Cultural Preservation project. Sincerely, Cathlena Martin, PhD Assistant Professor Coordinator of Game Studies and Design (GSD) Director of the Honors Program Game Studies and Design H ill H ouse Station 6 501 Montevallo, AL 35115 T. 205.665.6501


September 1, 2014 4008 Turlington Hall P.O. Box 117310 Gainesville, FL 32611-7310 Dear Professor Dobrin : I am writing to confirm my commitment and participation for your proposed project on MassMine: Collecting and Archiving Big Data for Social Me dia Humanities Researchers. I am currently a member of the Advisory Board. Broadly, my general areas of expertise include Digital Media, Digital Humanities, Environmental Humanities, Technology Studies, and Writing Studies. This project is important to my research as it provides a cutting edge and robust platform that will allow me to use new digital tools toward data mining methodologies, helping me to investigate many of the questions I am currently exploring, especially as related to how the intersect ion of social media, digital writing, and emerging technologies forges new identities of nature and environment. My own research aside, the open-source and collaborative nature of the MassMine platform will provide many other scholars engaged in Digital H umanities with a new tool that will help them perform their own res earch. In my view, MassMine is highly adaptable and can be used for many social media, big data, and digital humanities projects. As a scholar working in these areas, I find your work wit h MassMine impressive, important, and necessary given the current expansion and interest in the digital humanities and related fields (which I argue includes social media, big data, and digital archival research) As part of my commitment to this project, I will dedicate time and expertise to the project to help it succeed. Thank you very much for the opportunity to serve on the advisory board, and please do not hesitate to contact me if you have any further questions. Sincerely Sean Morey Assistant P rofessor DEPARTMENT OF ENGLISH Clemson University 816 Strode Tower Clemson, SC 29634 P 864 656 3193 F 864 656 1345


! !"##$%$&"'&& ()*+&,&-$**$)+ & ()*+&,&-$**$)+ & !""#$%&'()*(&+),#-) .-&/0&' ()1/0$&'%#+ ) 234)56)7%-$8()*-%9( ) :%+'#+);&88 ) 1&"'):&+"%+<=)>?)2@@A2 ) ) ) B(8(CD#+(E)FGH3I)JGJ K L3AM ) N&OE)FGH3I)JGG K MHG4 ) ) D&-'/&9APQ"06(/0 ) "#$%! &'() *"(! +#,-#.!/011-22##!3#14#%5 6 7!.%-2#!20 #89%#55!1:!52%0;<#52!5=990%2!>0%!2?#! !"##!$%&'()*+&)&%,",$-%.'/-++&0,$%1' "%2'3405$6$%1'7$1'8","'9-4':-0$"+'!&2$"';<)"%$,$&#'=&#&"405&4# 9%0@#A2!9%0905#B!4:! 2?#!2#$1!$2!2?#!C;-,#%5-2:!0>!DE0%-B$F!G5!$!?=1$ ;-52!5A?0E$%!.?0!%#E-#5!0;!50A-$E!1#B-$! 50=%A#5!>0%!B$2$6!7!A$;!$22#52!20!2?#!B->>-A=E2:!A-2#B!4:!2?#!3$553-;#!2#$1!-;!$AA#55-;<6! $%A?-,-;<6!$;B!H=#%:-;2#;!.$;2!>=EE!2#82!0>! 90525! $5!.#EE!$5!1#2$B$2$! >0%!$;$E:5-56!%#H=-%-;! -;>0%1$2-0;!4:!?$;B!0%!A052 ) 9%0?-4-2-,#!$AA#55!20!9%09%-#2$%:!B$2$520%#5F!I-2?!$! %#50=%A#!E-J#!3$553-;#6!$AA#55!20!1=E2-9E#!50=%A#5!0>!50A-$E!1#B-$!9052-;<5!$ ;B! $550A-$2#B!1#2$B$2$!.-EE!4#!E#55!#89#;5-,#!-;!2#%15!0>!2 -1#!$;BK0%!10;#:!20!$A?-#,#! $;B!1=A?6!1=A?!#$5-#%!20!520%#!2?$;!-2!-5!20B$:F L#2!1#!0>>#%!$!B#2$-E#B!#8$19E#!0>!?0.!7!1-!0;#!0>!0=%!0;<0-;%01!50A-$E! 1#B-$!=5#%5!2?$2!1#;2-0;!.0%B5!$;B!9?%$5#5!E-;J#B!.-2?!>00B40%;#!-EE;#556!$! A0;2%0EE#B!E-52!=5#B!4:!52$2#!$;B!E0A$E!?#$E2?!B#9$%21#;2!.0%J#%5!$%0=;B!2?#!A0=;2%:F! G!5=990%2!,#A20%!1$ A?-;# ) 4$5#B!AE$55->-#%!2?#;!$;$E:Q#5!2?#!;$2=%$E!E$;<=$<#!$5!.#EE! $5!9052!1#2$B$2$! R E0A$2-0;6!B$2#6!#2AF! R 20!$55-<;!#$A?!9052!$!5A0%#F!I#!2?#;!=5#!2?#! 5A0%#!20!.#-! >00B40%;#!-EE;#55!2?$2 4#$%!>=%2?#%!-;,#52-<$2-0;!4:!E0A$E!?#$E2?!0>>-A-$E5F!O?#!%#$50;-;!B-5#$5#! 5=%,#-EE$;A#!$;B!%#590;5#!.?-E#!B#A%#$5-;!5#,#%#!-EE;#55!>%01! 0=24%#$J5!2?$2!$%#!B#2#A2#B 200!E$2#F I#!A$EE!-2!$!S #,"#$#' #;<-;#T!4#A$=5#!-2!.#-%01!%?#20%-A$E!#,"#$#' 2?#0%:F! G!1$@0%!?=%BE#!-;!2?#!B#,#E091#;2!0>!0=%!.0%J!0;!2?-5!9%0@#A2!?$5!4##;!$AA#55!20!2?#! 9E$A#5!.?#%#!9#09E#!5?$%#!=9B$2#5!.-2?!-; >0%1$2-0;!$40=2!2?#-%!>00B40%;#!-EE;#55! 5:192015F!I?$2!-5!S0,#%5?$%-;0%!=56!.?#2?#%!-2!-5!-;!$!2-1#K59$A#! .-;B0.!J;0.;!20!A0%%#590;B!20!$;!0=24%#$J!U#F0%!2#52-;0%!E-,#!0%!%#$E ) 2-1#!$E#%25F!I#!?$,# ?$B!20!<$2?#%!0=%!B$2$!>0%!2%$-;-;!B0EE$%5!-;!5$E$%:!20!B0!50F!G! 200E!E-J#!3$553-;#!.0=EB!1$J#!.0%J!E-J#!0=%5!1= A?6!1=A?!#$5-#%W X#:0;B!1:!0.;!9#%50;$E!-;2#%#52 -;!$!5:52#1!E-J#!3$553-;#6!7!5##!2%#1#;B0=5!,$E=#!-;! -2!>0%!?-520%-$;56!$;2?%090E0<-5256!$;B!%#5#$%A?#%5!B0-;#.!B-52-;A2!$%#$5F!/0;5-B#%!2?#!,$E =#!0>!=5-;&$,1&$#,6!%#>E#A2#B!-;!2?#!$A2-,-2:!0>!,$%-0=5!50A-$E!1#B-$! 9E$2>0%156!$550A-$2#B!.-2?!$!9$%2-A=E$%!A=E2=%$E!#,#;2!0%!9?#;01#;0;F!+#A#;2E:6!>0%! -;52$;A#6!402?!50A-$E!1#B-$!$;B!2%$B-2-0;$E!1#B-$!0=2E#25!?$,# %#90%2#B!0;!2?#!.$:5! 2?$2!B->>#%#;2!50A-$E!9E$2>0%15!5=A?!$5!O.-22#%!$;B!D$A#400J!%#>E#A2!,#%:!B->>#%#;2! A0;2#19 0%$;#0=5!90%2%$-25!0>!G1#%-A$;!A =E2=%#F!7;!2?#!E$2#!M=11#%!0>!YZ[\6!$2!2?#! 5$1#!2-1#!2?$2!2?#!=;%#52!$;B!A0;2%0,#%5:!-;!D#%<=5 0;6!3*!.$5!2%#;B;0EJ5!.#%#!B=19-;!2?#!GLM!G550A-$2-0;!0,#%!0;!


D$A#400JF!C5-;!B#2$-E!2?$2!-5!A=%%#;2E:! ;02!>#$5-4E#F! 7!?$,#!A0;>-B#;A#!2?$2!2?#!3$553-;#!2#$1!A$;!1$J#!#8A#EE#;2!=5#!0>!2?#!%#50=%A#5! 9%0,-B#B!4:!2?#!&'(!20!2$J#!$!200E!A=%%#;2E:!=5#>=E!>0%!$!?$;B>=E!0>!%#5#$%A?#%5!$2!2?#! C;-,#%5-2:!0>!DE0%-B$!$;B!B #,#E09!$;!09#; ) 50=%A#!200E!>0%!? =1$;-2-#5!%#5 #$%A?#%5! $%0=;B!2?#!A0=;2%:!$;B!2?#!.0%EBF!O?#!9%0@#A2!9E$;!2?#:!9%0905#!-5!50E-B!$;B! $A?-#,$4E#6!$5!-5!2?#-%!9E$;!>0% B-55#1-;$2-0;6! 2#52-;< $;B!2%$-;-;< F!30%#0,#%6! 3$553-;#!-5!2?#!50%2!0>!9%0@#A2!2?$2!-5!=5#>=E!#;0=!2#A? ) 5$,,:!%#5#$%A?# %5!2?$2!-2!.-EE!E-J#E:!$22%$A2!$!E$%<#!B#,#E09#%!A011=;-2:F!O?-5!-5! -190%2$;2!.?#;#,#%!$!2#$1!9%0905#5!$;!09#; ) 50=%A#!9%0@#A26!4#A$=5#!2?#!4-<<#52!%-5J! -5!2?$2!2?#%#!.-EE!4#!-;5=>>-A-#;2! B#,#E091#;2! #>>0%2!$;B!J;0. ) ?0.!$,$-E$4E# $10;0%.$%B!0;A#!-2!?$5!4##;!<-,#;!$;!-;-2-$E! 40052F!7!B0!;02!5##!2?-5!9%04E#1!.-2?!3$553-;#F 7!A0=EB!;02!4#!10%#!#8A-2#B!20!5##!3$553-;#!10,#!>0%#.$%BF!7!2?-;J!-2!?$5!2%#1#;B0=5! 902#;2-$E!20!5#%,#!1$;:!%#5#$%A?#%5!-;!.$:5!2?$2!$%#!;02!# ,#;!>=EE:!04,-0=5!=;2-E!2?#! 200E!>-;B5!-25!.$:!-;20!2?#!?$;B5!0>!A%#$2-,#!?=1$;-52!5A?0E$%5F!7!.?0E#?#$%2#BE:! 5=990%2!2?#!9%0@#A26!$;B!7N,#!E#2!"%F!"04%-;!$;B!2?#!02?#%5!0;!2?#!2#$1!J;0.!2?$2!7N1! ?$99:!20!4#!$;!#$%E:!2#52!=5#%6!$;!#$%E:!$B092#%6!$;B!$;!$B, -50%!0;!2?#!9%0@#A2!->!2?#:! >-;B!1:!$55-52$;A#!$2!$EE!=5#>=EF!($,-;6!7!4#E-#,#!2?-5!9%0@#A2!-5!$5!5$>#!$!4#2!$2!2?#!9%0905$E!52$<#!$5!$;:!7!?$,#!5##;F! M-;A#%# E:6! ! I-EE-$1!($%2 ) "$,-B50;6!]?F"F G55 0A-$2#!]%0>#550%6!I%-2-;0%!_%$B=$2#!M2=B-#56!/0EE#<#!0>!G%25!^!L#22#%5 /0) "-%#A20%6!I%-2-;

August 15, 2014 Colleagues : I am writing in support of the MassMine collaborative project at the University of Florida. Over the last several years, Humanities scholarship has demo nstrated increasing interest in the generic appellation big data. Rather than limit scholarship to textual analysis or textual interpretation (in a text or in a cultural moment) as has often been the case, scholarship now explores the ways patterns, trends, ideas, beliefs, and so on can be identified by building signifi cant data sets out of the traditional objects of scholarly focus. Big data has come to mean all of the information available for scholarly pursuit and the difficulty encountered in isolating such data and using it for research means. Data, we have come to realize, needs to be mined. This work has proven to be essential to Humanities scholarship as scholars begin to understand that in order for analysis to be effective, it depends on large sets of information, some of which cannot be fully accessed without the available tools of data collection (i.e. software). National scholars such as Stephen Ramsey and Lev Manovich have repeatedly demonstrated the need to situate Humanities scholarship in terms of data and data mining Whether through l ocating trends ov er bod ies of work or periods of time or whether via visualizing trends or concepts both of these kinds of data mining projects are redefining Humanities scholarship in quantitative ways. In turn, academic work is rethinking how search works from popular, online portals such as Google to more nuanced algorithms and collections that challenge the ways we assemble information for various purposes. At the heart of all contemporary scholarship is search and how the data discovered in search is used. The MassMine project situates itself a s another important contributor to this kind of academic work and promises to be a leader in the field In addition to tapping into traditional sources of information, MassMine promises to utilize open APIs in order to inc orporate less obvious sources of information, such as Twitter, Facebook, and everyday social media platforms. By doing so, MassMine promises to generate a more complete network of intellectual and popular interactions that can greatly aid in various areas of scholarly research. As this grant proposal makes clear, however, the collection of data is often difficult. Not only does appropriate software need to be designed for specific purposes, training must take place as well. As an open source platform wit hout coding requirements, MassMIne i s designed to


ease the process as well as provide the necessary training for students and faculty to effectively use the software. MassMine taps into the open nature of APIs in order to utilize the Web and many social media platforms emphasis on large scale integration and development. In that sense, it is accessible financially (it is free and open source) and pedagogically (it will be accompanied by appropriate training and documentation). We might consider this proposal in contrast to traditional university Humanities research tools like simple library search which often scans subscription portals one object at a time A basic library search is not a data driven search since it only returns exact or near exact mat ches that correspond to the universitys paid holdings That is, the typical university database is limited in how it returns responses or allows a researcher access to different patterns of information We might then consider a data driven search such as that proposed by MassMine, that would explore a number of texts simultaneously, within a librarys collection and elsewhere, identifying potential relationships, including relationships not yet visible or obvious. For instance, contemporary approaches to search will often reveal one dominant narrative regarding an issue ( whether that issue is political, social, or economic) that attempts to speak fo r the moment in its entirety. A broader data driven search that is not limited in scope or textual analysis might, instead, turn up a number of narratives or patterns that challenge dominant thinking on the subject or allow for new insights the researcher could not obtain via a typical library search With this last point, this is where I see MassMine playing a significant role in our current period of data and digital search MassMine can alter the teaching and performance of university sch olarship at multiple levels and for f aculty and students. The more information we can access and work through, the better we become at understanding broader implications of phenomenon and their meanings. The more information we can access and work through, via an application like MassMine, the better we are to approach complex narratives and situations. If MassMine w ere available at the University of Florida, I would ask undergraduate students I work with to use it from the University of Kentucky, where I teach. I see its contribution as an incredibly valuable addition to the work we do in the Humanities. For too long we have taught our students limited search skills via the very limited tools that they have access to. My own scholarship has attempted to theorize such shifting search strategies and the frustrations with pedagogies designed in opposition to digital sea rch. With MassMine, teaching can greatly shift to include strategies appropriate to the digital age we work within.


Because MassMine will be housed at the University of Florida, it is important to note how it can serve not only the land grant mission of the university, but the broader mission of higher education: to allow continued access to information for intellectual and research purposes. Given many private efforts to limit information accessibility, MassMine will be extremely important to scholarship As a public university project, MassMine will benefit researchers and students in a much broader way than other applications (many of which limit access via fees or other restrictions). I strongly recommend that the NEH support this application. MassMine is exactly the kind of tool we need in contemporary research and scholarship. Sincerely, Jeff Rice Martha B. Reynolds Professor in Writing, Rhetoric, and Digital Studies University of Kentucky


MassMine: collecting and archiving big data for social media humanities researchers University of Florida Appendix: Detailed Work Plan Pre -Startup Phase: 2014 April: MassMine 0.1.0 released on GitHub (Van Horn, Bevridge) April: P resentation on MassMine at THATCamp Gainesville (Beveridge) Fall: Capital University, course on data science using MassMine (Van Horn) Nov. 13: Training on using MassMine on UF Research Computing Servers (Beveridge) Facebook and Wikipedia APIs added as data sources by January 2015 (Van Horn) May -June 2015: Scaling for Many Users : Server Installation Scaled; Updates for More Backend Supports Confirm project charter with graduate student and postdoctoral researcher mentoring plans and user support processes for Scholars Group and all users (Project Team) Software: Share on ongoing basis on GitHub and with major releases to the IR@UF (Van Horn) Software: Implement storage support options for SQL and MongoDB; begin developing Export & Processing Module; initial development activities for adding the GUI (Van Horn) Collaborate with Advisory Board to identify potential Scholars Group members (Project Team) July 2015 Software: Develop specifications to add data cleaning and subsetting func tionality (Van Horn) Develop initial training p rogram materials, user testing and feedback processes, and documentation for iterative development for new features, functions, GUI display, and related sociotechnical workflows and data practices ( Project Team ) Training Session on Installing MassMine on Servers: for infrastructural service providers (e.g., central computing units, research groups/units/departments) for MassMine server installations supporting individuals and groups of researchers. Training to include: technical aspects discussion and planning for supporting sociotechnical processes for data privacy needs and integrating workflows with IRBs, and feedback from central service providers on their needs to support MassMine and data research (Be veridge, Dobrin, Gitzendanner) August 2015 Confirm Scholars Group (57 humanities scholars from UF and Clemson); ensure access to MassMine hosted on UF Research Computing ( Project Team ) Schedule training sessions for in-person and online webinars in consultation with Scholars Group and specific additional user groups; begin promoting training sessions (Project Team & DHLG) Software: Release beta Export & Processing Module with data curation functionality (e.g., remove unnecessary HTML, markup, irrelev ant URLs, spur ious symbols or punctuation, non specific data attached to API access) and with export to CSV and XLSX (Van Horn, Beveridge) Update documentation and training materials fo r installing MassMine, using Export & Processing Module, and submitting bugs and feature requests through GitHub as tailored to various user groups including those new to GitHub ( Project Team ) September 2015: Training Sessions with Scholars Gro up (Project Team, DHLG) Trainings with humanities scholars for their data research projects, feedback on MassMine functions and resources, and establishing overall collaborative development processes Session One (in person, webinar): Installing and using MassMine on local computers Session Two (in person, webinar): Using MassMine on UF Research Computing systems Session Three (in person at UF and Clemson): Developing research questions, project scope, and goals for using MassMine (individually and collaboratively) with software and methodological assistance, discussion of data acquisition strategies for statistical needs, intellectual goals, and socio technical concerns with data privacy, IRBs, and other supports (session feedback to create project examples for future online trainings) Liaise with Scholars Group (users on UF Research Computing or local computers) for initial data collection for projects; assistance asneeded ; documenting discussions to inform development


MassMine: collecting and archiving big data for social media humanities researchers University of Florida October 2015: Collaborative Development and Iterative Testing (Project Team) Scholars Group data collection ongoing with training and support for project duration Liaise with Scholars Group on inspecting data output including checking for inconsistencies/abnormalities, supporting technical needs for computing descriptive statistics on initial data as an added check, and gathering feedback on user experience Liaise with Scholars Group for itera tive testing processes on technical functionality and instructions on preliminary data inspection with examples in common analysis software (Excel, SPSS, R) regarding the analysis and interpretation of data; as well as instructional support on data research methods including identifying measurable variables, approaches to "existing data" and correlation designs, and core techniques of descriptive statistics November 2015: Ongoing Collaborative Development and Iterative Testing (Project Team) Software: Finish software development updates identified by Scholars Group ; release upda ted Export & Processing Module (Van Horn) Scholars Group data collections will now include large data tables of information, well suited to and needing methods for grouping and narrowing of search for variables of interest Liaise with Scholars Group for analysis and data visualization software needs with the Export & Processing Module December 2015: Iterative Testing and Development Software: Update the Export & Processing Module for functionality and tool design based on feedback from Scholars Group (Van Horn) January 2016: GUI Alpha Release; Ongoing Collaborative Development & Iterative Testing Software: Release MassMine GUI alpha version; finish updated Export & Processing Module; complete any remaining software updates for GUI standalone installations and as identified by Scholars Group data collection (Van Horn) Liaise with Scholars Group to support installing GUI software version; gathering their feedback on the GUI version ( Project Team ) February 2016: Finalize GUI Release, Documentation and Resources, Advanced Data Trainings Create documentation for MassMine GUI interface ( Project Team DHLG) Scholars Group reviews documentati on, provides feedback, suggestions for resources related to their research questions and areas ( Project Team DHLG) Advanced Data Training Sessions: Liaise with Scholars Group on needed trainings and conduct training sessions, possible t opics may include methods for inspecting and querying collected data, tools and procedures for exploratory data analysis, hypothesisdriven descriptive and inferential statistical investigations ( Project Team DHLG) March 2016 : Training Sessions for New Users and Scholars G roup (in person and online) MassMine for Humanities Data Research: Session on using MassMine; covers developing research questions, project scope, goals for using MassMine individually, review of software and methodological concerns including data acquisit ion strategies for statistical needs and intellectual goals, sociotechnical concerns with data privacy, and IRB ( Project Team DHLG) MassMine for Teaching Humanities Data Research : Session builds from MassMine for Humanities Data Research, covers examples, opportunities, and considerations for teaching and using MassMine in the classroom (Bev eridge, Dobrin, Van Horn) MassMine for Childrens Literature Data Research: Session tailored to support Childrens Literature researchers and session will inf orm how to best support other specific research area s for how specific needs shape shared concerns ( Project Team DHLG) April -May 2016: Promotion & Outreach for MassMine GUI Version Finalize documentation and materials ( Project Team DHLG) Press release for official MassMine 1.0; promoting MassMine at THATCamps in 2015 and other promotional activities ( Project Team DHLG)


MassMine: collecting and archiving big data for social media humanities researchers University of Florida Appendix: Selected References and Resources MassMine and Related Resources MassMine Open Source Software Code on GitHub: Training Workshop Handout for Using MassMine on UFs Research Computing Cloud Server for upcoming training on Nov. 13, 2014, MassMine Project Assistant, Position Description: Case Studies of Using MassMine for Data Collection, Curation, & Analysis in Humanities Research, Beveridge, Aaron. Conference Presentation. Writing Studies a nd Data Science in the 4th Paradigm. Indianapolis, IN: CCCC, March 19-22, 2014. Beveridge, Aaron. Presentation and Data Training. Humanities Software Development: Data Mining and Writing Studies. Gainesville, FL: THATCamp -Gainesville, April 24, 2014: -development-data-mining-andwriting -studies/ UF Research Com puting Resources: o To support scholars in fully testing MassMine within the full research process, Project Team members will utilize and integrate trainings on related need s (e.g., trainings on GIS, SPSS, data management, etc.); examples from UF: Humanities Data Research Jockers, Matthew L. Text Analyis with R for Students of Literature. Springer, 2014; with textbook materials available for download from author website: with -r-for-students-ofliterature/ Graham, Shawn, Ian Milligan, and Scott Weingart. Big Digital History: Exploring Big Data through a Historians Macroscope. Imperial College Press, 2014 and online: http://w The Programming Historian, resources on coding and development for data collection and processing, Flanders, Julia and Trevor Muoz. "An Introduction to Humanities Data Curation." DH Curation Guide: a Community R esource G uide to D ata Curation in the D igital Humanities, Willford, Christa and Charles Hen ry One Culture: Computationally Intensive Research in the Humanities and Social Sciences, A Report on the Experiences of First Respondents to the Digging into Data Challenge, Washington, DC: Council on Library and Information Resources, 2012:


MassMine: collecting and archiving big data for social media humanities researchers University of Florida Appendix: Humanities Research Questions Needing MassMines Data Collection, Curation, and Analy sis Functionalities Examples collected following MassMines version 0.1.0 release in summer 2014. Dislocating the Hip: Accounting for taste through spreadable media Shannon Butts, English Department, University of Florida Over the past few decades, a rhizomatic approach to scholarship has expanded the ways we map information, embracing multiplicity alongside an archeology of knowledge. Yet within contemporary media studies, terms like going viral still invoke biological metaphors that muddle the powe r relations between producers and consumers. Who really crafts what we view as hip and how can we attribute emerging trends? According to Henry Jenkins, the viral emphasis on replication and transmission fails to consider the networked reality of everyday communication like a childhood game of Telephone ideas change through sharing. Jenkins work with spreadable media engages the viral, but also acknowledges the participatory aspect of media circulation a process that often transforms, repurposes, or distorts information as it passes through diverse communities. The initial spread of Falls Hottest Fashions might begin within the pages of Vogue, but go viral as a street style bricolage posted to Pinterest or trending on Twitter. Building on Jenkins concept of spreadable media, this project works to map the circulation of cool and the hype of hip within the world of popular fashion, dislocating traditional origin stories of producer and consumer. My research will use MassMine to examine print pub lications alongside user generated content to trace the evolving multiplicity of whats ho t or not in the Twittosphere. Tracking styles from initial appearance, through circulation, and variation, I aim to better understand how social media creates a platform for popularity and controls trends of information. Orange may be the new black, but what about next season? Game Studies and Cultural Preservation: Mapping Archival Discourse Kyle Bohunicky, English Department, University of Florida In James Newmans Best Before: Videogames, Supersession and Obsolescence Newman argues that digital games, as a medium, have proven remarkably durable, but despite cultural acceptance, these media demonstrate a troubling fragility. Questioning the putative naturalness of decay and obsolescence in the medium, Newman suggests that digital games are rapidly disappearing thanks to marketing, advertising, and journalistic discourse. The digital games industry, Newman suggests, has shown little interest in preservation and th e project of game history and heritage. At a discursive level, a large part of this issue is that the digital games industry, developers, and players are entrenched in a language of computing that creates an illusory sense of archival activity. While battery saves, password systems, save states, memory cards, and save spots have helped players preserve their personal activities within games, the discourses that emerged from and around game memory are opaque and widely understood as consistent throughout the history of games. I suggest that closer attention to the discourses and conversations these storage technologies developed from and developed can give the medium a stronger understanding of its past and future. Thus, this project intends to nuance Newman s claim about illusory archival discourse by showing how different storage technologies affect the discourse about the heritage and future of both digital games and play. Such an investigation will need to span multiple communities including forums such as Neogaf, FAQ pages such as GameFAQs, and emulation sites such as Zophars Domain and NESbox. Thus, Mass Mines web mining feature that pulls data from specific URLs will be useful to show the relationship between digital storage and the mediums future/pas t that emerges in its discourse. Additionally, this work will enable me to detail the archival strategies that players themselves have developed to resist the industrys neglect of its history. Teaching Data Research in the Humanities and Social Science C lassroom Nicholas M. Van Horn, Department of Psychology, The Ohio State University A key component of training in the social sciences is the development of research skills, including an understanding of the relevant terminology, classification, methods, an d trends in use by investigators.


MassMine: collecting and archiving big data for social media humanities researchers University of Florida Central to these concepts is the collection and analysis of theoretically -driven data. Students enrolled in a research methods course are taught to be informed consumers and producers of research. Commonly, to achieve this each student is mentored through the process of the scientific method by conducting a guided research project. Often, this involves identifying and taking a developed research plan from conception, to evaluation, to dissemination by means of a written pap er and/or presentation. In the classroom setting, however, time and resources limit the scope of viable research hypotheses possible in such contexts. As a result, research questions are restricted to those answerable by simple, non experimental approaches such as qualitative, survey, and correlational designs. Further, access is commonly limited with respect to populations and data sources of interest, and the introductory nature of the course virtually ensures that students lack the necessary skills to acquire and examine highvalue data. These restrictions can weaken the intrinsic motivation and enthusiasm of students forced into compromised research projects due to the logistic constraints of the classroom. I plan to use MassMine as a pedagogical tool in the teaching of research methods in the social sciences in the fall of 2014 primarily as a means to overcome a subset of these problems. MassMine will serve two primary functions in my course. First, it will enable access to a new class of "existing data" research designs using non-trivial data sources (e.g., social networks) that are relevant and meaningful to my students. Second, data analysis and exposition are challenging skills to teach and to learn. Instructional examples are typically driven by inconsequential inclass surveys or existing "toy" data that tacitly separates the analysis of data from the research context that it rightfully belongs in. By contrast, MassMine will enable me to develop a quick research question live in class with student pa rticipation, collect data to test the class' hypothesis, and then immediately perform an analysis in real time. By situating the analysis within the context of development and acquisition, I believe students will connect with the quantitative and qualitati ve profile of the results in ways that theoretically devoid examples do not. Tracking Images Across Social Media Laurie Gries, Department of English, University of Florida Circulation Studies, the study of rhetoric and writing in motion (Gries 2013), is an emergent area of study within two disciplines: Communication and Rhetoric And Composition/Writing Studies. While much important work has already been done in this area in regard to textual circulation, when it comes to visual rhetoric, studie s of circulation are limited by the lack of software to easily access, organize, and analyze social media data. As Shepard Faireys now iconic Obama Hope image makes evident, thanks, in part to social media, images circulate, transform, and engage in diver gent collective activities at viral rates. As described in Iconographic Tracking: A Digital Research Method for Visual Rhetorics and Circulation Studies (2013) as well as my forthcoming book Still Life with Rhetoric: A New Materialist Approach for Visual R hetorics (2015), I have developed a digital research method called iconographic tracking to trace the circulation and transformation of viral images. Such method has proven effective in tracking the viral circulation of a single image. However, several problems currently exist that Mineware has potential to address. First, iconographic tracking largely depends on manual research a time intensive labor that is incapable of keeping up with a viral image in a digital age. In addition this method currently reli es on multiple software programs to store, organize, analyze, and visualize data. In the past, I have relied on Zotero to store data, but due to glitches with this software, I have lost a significant amount of data. In addition, Zotero demands user input t o capture website and image URLsa time consuming process that software programs such as Mineware ought to be able to automate. Lastly, Zotero does not include analytical or visualization components; therefore, I have been forced to rely on manual coding methods and diverse visualization programs, each of which require their own training. For such timerelated and labor reasons, iconographic tracking needs a reliable onestop software program such as Mineware to support its research methods. I am currentl y working on writing a book-length rhetorical biography of the Obama Hope imagea research project that demands more research with iconographic tracking. I plan to use MassMine to support this research project. Specifically, I will rely on MassMine to track the


MassMine: collecting and archiving big data for social media humanities researchers University of Florida circulation and transformation of the Obama Hope image across social media; capture website and image URLs; and analyze and visualize research findings. Transgender Representation in Social Media RL Goldberg, English Department, University of Florida Recent years have seen a dramatic increase in transgender representation in mainstream film and television, but this has not been mirrored in literary fiction, a surprising lacuna given the liberal range of play and potential that fiction presents. Ins tead of fiction, as Jay Prosser shows in Second Skins: The Body Narratives of Transsexuality, transgender representation in literature almost exclusively takes the form of memoir and autobiography. Certainly, there is no dearth of transgender autobiography, historical or contemporary. As Joanne Meyerowitz points out, transgender narratives proliferated from the early 1920s as doctors, predominantly in Germany, agreed to provide genderaffirming procedures to patients with cro ssgender identification. As Nort h American media began publishing stories on sex change operations, readers with crossgender identification were able to not only imagine surgical intervention for themselves, but also found the language to express their desire for genderaffirming treatme nt. Yet even today, transgender writers predominantly wri te memoir rather than fiction. Though in recent years there has been relative interest in mainstream transgender literary fiction, examples are still limited. Instead, transgender authors seem to be producing fiction elsewhereon their websites or blogs, self-publishing, or publishing with small presses. Increasingly, this is democratizing the publishing industry (for trans and cis writers alike) as writers who would not gain traction with mainstream publishing houses are finding niches and outlets for expression. For my project, I propose using MassMine to explore transgender fiction being produced on the margins, particularly, on social media. Through empirical analysis I will be able to understand current trends in transgender and queer fiction, and speculate on the future of transgender literature. Especially given how diffuse the transgender community isinternational and diverse such analytics are invaluable to understanding the production of fict ion in the increasingly global context. no ms combis: tracing contemporary public transport reform in Lima, Peru Jamie Lee Marks, Department of Anthropology, University of Florida Over the past few decades, social scientists have increasingly discussed infrastructural systems, arguing that scholarly discussions be broadened to include the individuals that circulate through, imagine, and (re)constitute these systems. My dissertation research will provide an ethnographic portrait of largescale transportation reform currently underway in Lima, Peru. Since 2010, the Metropolitan Lima Municipal Council (MML) has prioritized implementing a series of regulations and public works to create an integrated transport system. The next phase of the reform will restructure 50% of public transportation vehicles and routes in the city over the next few years, and will require the gradual removal of the majority of vehicles and routes that the most of Limas residents have been using since the 1990s. Despite the importance of existing vehicles, routes, and transit workers in the daily journeys of Lima residents until present, municipal transport reform campaigns explicitly reference existing buses and those who operate them as too chaotic, unsafe, unclean, and informal to be part of the citys future. These campaigns urge competing narratives about Limas political, economic, and social history and future to the surface charging conversations about mobility and transit and rendering social understandings of these phenomena available for critical analysis. Using a multi-sited ethnographic approach that combines digital and traditional participant observation, I will analyze how various social actors exp erience, remember, imagine, and narrate Limas public transit infrastructure. My fieldwork is based on (1) participant observation in the spaces of social and sensory encounter associated with emerging, existing, and disappearing transit infrastructures (2 ) semi structured interviews with members of various publics involved in Limas mobile landscapeincluding transit workers, journalists, urban planners, NGO workers, and Lima residents of varied ages and socioeconomic classes and (3) discourse analysis of portrayals the transportation reform cir culated in online news forums, as well as in Online Social Network (OSN) spaces such as Twitter and Facebook. These digital discursive spaces present a novel challenge to analysts interested in citizenship, how residents imagine infrastructural reform, and the relationships


MassMine: collecting and archiving big data for social media humanities researchers University of Florida between municipal campaigns and public understandings. MassMine will allow me to systematically organize posts and comments on Twitter and Facebook, rendering them manageable and intel ligible for c ritical analysis. This is a critical aspect of my research project. Mapping Premediation on Social Media Jake Greene, English Department, University of Florida The rise of networked media in the twenty first century inaugurated undeniable changes to how world events circulate within society, which, when viewed in light of the lingering cultural trauma of 9/11, illuminates the emergence of the medial phenomena that Richard Grusin refers to as premediation. According to Grusin, premediation is the process through which a society anticipates the affective grounds of future trauma by mediating a variety of potential narrative paths. Although Grusin clearly states that premediation materializes in many cultural forms (film, literature, television, etc.), he i s primarily interested in how the mainstream news media construct and disseminate narratives of premediation in response to environmental, economic, and political concerns. For this research, I propose to combine Grusins theoretical formulation of premedi ation with an empirical application of Mass Mines data mining software in order to identify premediated news stories and trace their circulation on Facebook and Twitter. Specifically, I am interested in analyzing how users construct original text in their posts when linking to premediated news stories as a way of testing Grusins claim that premediation functions as an affective prophylactic against future trauma. From nightly reminders about the impending dangers of global climate change to recent specu lations on the implications of conflict in the Middle East, the mainstream media is rife with premediated news. As a case study for tracking premediation on social media, I will follow the (re)circulation of news stories related to the recent increase in s inkholes in the state Florida. This story is useful in not only providing a limited scope through which to test this methodology, but it also connects to the more general category of ecological premediation. Posthumanism Melissa Bianchi English Department, University of Florida Contemporary scholars of humananimal studies have argued that ideologies perpetuating human social injustices are closely linked to those that justify the institutional exploitation of nonhuman animal species. For example Cary Wolfe ( Animal Rites: American Culture, the Discourse of Species, and Posthumanist Theory Chicago: U of Chicago P, 2003) claims that: as long as it is institutionally taken for granted that it is all right to systematically exploit and kill nonhuma n animals simply because of their species, then the humanist discourse of species will always be available for use by some humans against other humans as well, to countenance violence against the social other of whatever speciesor gender, or race, or class, or sexual difference (8). Wolfes argument forges a significant link between humans and other animals by suggesting that institutional exploitation and violence against both nonhuman species and certain human groups stems from a singular source: speciesism. He indicates that to combat the marginalization of any category of living beings requires that we attend to how our discourse reiterates presumptions of superiority over social others. My research builds on Wolfes work by tracing how a humanist disc ourse of species is employed and rallied against on one particular social media website: Twitter. Because Twitter is often used as a platform to raise awareness, advocate, and attack particular ideologies through hashtag movements, the website offers a means for organizing and tracking social movements that defend the rights of marginalized groups. This project will use Mass Mine to gather data from two recent and popular hashtag movements, #Blackfish and #Yesallwomen, that speak to cetacean and womens exp loitation, respectively. I will compare the rhetoric and circulation of these hashtags to examine what similarities, differences, and links exist in their discourses. From this data, we may determine how discussions of cetacean exploitation on inform the ways we identify and define women as social others on Twitter, and suggest productive avenues for changing these discourses.


MassMine: collecting and archiving big data for social media humanities researchers University of Florida Appendix: Workshop Handout: Using MassMine on UFs Research Computing Cloud Server Login and Startup Remote login to the UF cloud server setup specifically for MassMine research, through a Linux console on a computer with the Ubuntu OS installed. (UF provides training on accessing cloud resources through any operating system.) Startup Screen with basic text interface; users do not have to understand R code in order to collect data.


MassMine: collecting and archiving big data for social media humanities researchers University of Florida API Connection MassMine checks the last configuration file and offers to reauthenticate API connection to restart a similar data collection. Based on the configuration file information, MassMine automatically displays the API account for authentication. Authentication only needs to happen once; after that, users can run the software each new time without re authentication, as long as the same API credentials are used. The API provides an authorization PIN, which syncs MassMine with the users API access once the PIN is entered.


MassMine: collecting and archiving big data for social media humanities researchers University of Florida Configuration File MassMine collects data based on the directions provided in the configuration file. The screenshot below shows the configuration file open ed in a simple text editor. The configuration file is machine readable, and editable in any basic text editor on any operating system. Users can save and re-process multiple configuration files to run different kinds of data collection activities the console application of MassMine. Templates of various basic configuration files are in process for use in trainings and experimentation for console application users. Success Screen MassMine responds to let the user know when a collection finished successfully without error, with all data automatically saved to the cloud space provided by UFs Research Computing.