<%BANNER%>

UFIR UFHSC



Data Lifecycle Management
CITATION PDF VIEWER
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/IR00000801/00001
 Material Information
Title: Data Lifecycle Management
Physical Description: Presentation
Creator: Norton, Hannah F.
Garcia-Milian, Rolando
 Subjects
Genre:
Spatial Coverage:
 Notes
Acquisition: Collected for University of Florida's Institutional Repository by the UFIR Self-Submittal tool. Submitted by Hannah Norton.
Publication Status: Unpublished
General Note: Presentation made at the second UF Research Computing Day, April 25, 2012. For more information see: http://www.it.ufl.edu/community/events/rcday/agenda-2012.html
Funding: This project has been funded in part with federal funds from the National Library of Medicine, National Institutes of Health, under Contract # HHS-N-276-2011-00004-C.
 Record Information
Source Institution: University of Florida Institutional Repository
Holding Location: University of Florida
Rights Management:
This item is licensed with the Creative Commons Attribution License. This license lets others distribute, remix, tweak, and build upon this work, even commercially, as long as they credit the author for the original creation.
System ID: IR00000801:00001

Downloads

This item is only available as the following downloads:

Data_Lifecycle_Management-_RCD_2012 ( PDF )


Full Text

PAGE 1

Data Lifecycle ManagementHannah Norton and Rolando Garcia Milian UF Health Science Center LibrariesIma g e credit: htt p : // www.flickr.com /p hotos / bl p rnt / 3642742876 / in /p hotostream /

PAGE 2

Agenda• The data lifecycle and data types • Metadata and labeling your data • Storage and preservation • Data management planning • Additional resources

PAGE 3

Data Lifecycle** Based on Data Documentation Initiative (DDI) version 3.0 Combined Life Cycle Model Study Concept Data Collection Data Processing Data Distribution Data Archiving Data Discovery Data Analysis Repurposing Data Analysis

PAGE 4

Data Lifecycle** Based on Data Documentation Initiative (DDI) version 3.0 Combined Life Cycle Model Study Concept Data Collection Data Processing Data Distribution Data Archiving Data Discovery Data Analysis Repurposing Data Analysis

PAGE 5

Data generated throughout the lifecycle has different needs• Raw data some must be kept forever, others can be discarded after the project is complete • Intermediate data for analyzing and processing can be often be discarded at the end of the computation, but computational methods should be for reproducibility • Final data should be made available indefinitely to the community

PAGE 6

How long do you need your data stored? 18.4% 18.4% 42.9% 18.4% 2.0% 8.3% 12.5% 29.2% 43.8% 6.3% 22.0% 16.0% 42.0% 20.0% 0.0% 0%10%20%30%40%50% Forever More than 10 years 6 10 years 1 5 years Less than a year Percentage of Respondents Raw Data (n=50) Intermediate/Working Data (n=48) Processed Data (n=49)

PAGE 7

Data types and reproducibility• Reproducibility is a key parameter in determining the need for long term preservation of data:– Stable (S) : Derives from simulations, reductions, measurements – Ephemeral (E) : Cannot be reproduced or reconstructed as it is time sensitive – Costly (C) : Stable but costly to regenerateModified from: http://tinyurl.com/724hezs

PAGE 8

Data types and reproducibility• Experimental data(S, E, C): from labs and equipment • Observational data (E): captured in real time • Derived data (S, E, C): after data mining and statistical processing • Simulation data (S, E, C): data generated from modeling processes • Software (S, E, C) Modified from: http://tinyurl.com/724hezs

PAGE 9

Metadata/annotation must be added throughout the lifecycle Study Concept Data Collection Data Processing Data Distribution Data Archiving Data Discovery Data Analysis Repurposing Data Analysis Metadata

PAGE 10

What exactly is metadata again?• Descriptive information that helps you and others understand your data • “Data about data” that acts as a surrogate for your datawhen you or others are trying to:– Find the data later – Know what the data is later – Share the data later

PAGE 11

How are your data labeled or annotated? 17.3% 21.2% 78.8% 32.7% 0%20%40%60%80% My data are not annotated. Referentially, with an associated codebook Manually, by a member of my research team Automatically, through data collection tool Percentage of Respondents n=52

PAGE 12

Metadata across the disciplinesBasic information to keep: • Descriptive – What is it about? – Title, time, author, keywords – Relations to other data objects • Administrative – Ownership and use permissions• Provenance – Where does it come from? – History of changes to the data, versions More specific information varies by discipline

PAGE 13

Standards• Where possible, use standard data formats and metadata formats.– Saves you time, saves the data users time. – The tricky part is finding the right standard.• Ontologies and controlled vocabularies can also help standardize the contents of your metadata and make it easier to understand.

PAGE 14

Sample metadata standards• Dublin Core • Darwin Core • METS (Metadata Encoding and Transmission Standards) • FGDC (Federal Geographic Data Committee) • DDI (Data Documentation Initiative) • ABCD (Access to Biological Collections Data) • AVMS (Astronomy Visualization Metadata Standard) • CSDGM (Content Standard for Digital Geospatial Metadata)

PAGE 15

From: http://www.ncecho.org/dig/ncdc2007.shtml#6

PAGE 16



PAGE 17



PAGE 18

Sample ontologies/ controlled vocabularies• LCSH (Library of Congress Subject Headings) • MeSH(Medical Subject Headings) • GeneOntology • Plant Ontology • International Standard Classification of Education • NASA Thesaurus • Multilingual Thesaurus of the Geosciences • SNOMED Clinical Terms

PAGE 19



PAGE 20

Source: http://xkcd.com/927/

PAGE 21

Finding a home for your data• Data storage, both short term and long term, can take place in 3 types of places:– Locally, within the lab or research environment – Within the institution – Within a national/discipline based repository

PAGE 22

How do you store your data? 9.6% 7.7% 1.9% 30.8% 78.8% 17.3% 34.6% 38.5% 0%20%40%60%80% Other Discipline specific database, e.g. NCBI (National Center for Biotechnology Information) Professional organization/association storage (e.g. ICPSR, available with published findings) Institutional storage College or departmental computer network Online (e.g. Drop Box/Google docs/Amazon cloud) External hard drive/CDs/DVDs Personal laptop/desktop Percentage of Respondents n=52

PAGE 23

How are you sharing or planning to share your data? 10.0% 46.0% 22.0% 4.0% 68.0% 26.0% 0%10%20%30%40%50%60%70% I do not share data Making them available informally to peers on request Making them available online via a project or institutional website Depositing them in UF’s Institutional Repository (http://ufdc.ufl.edu/ir) Submitting them to a journal to support a publication Depositing them in a discipline specific data center or repository Percentage of Respondents n=50

PAGE 24

RepositoriesAdvantages of an institutional repository • Linked to your institution – intellectual capital of the institution in one place • You can put all your datasets together • Some guarantee of support from the university • Some domain repositories may “go out of business” once their funding ends Advantages of a domain repository • Your data will stored with similar datasets • Researchers will find your data easily • The repository will understand what your data needs in terms of storage, archiving and preservation • Computational tools may be developed to crunch a critical mass of data of a certain kindAdapted from: http://libraries.mit.edu/guides/subjects/data management/Managing%20Res earch%20Dat a%20101.pdf

PAGE 25

What about non digital data?Consider migrating to digital…

PAGE 26

Data Management Plans describe the whole data lifecycle. Study Concept Data Collection Data Processing Data Distribution Data Archiving Data Discovery Data Analysis Repurposing Data Analysis

PAGE 27

What is a data management plan (DMP)?• A clear description of how you plan to address data management issues in your research. • A way to communicate your data management efforts to members of your team and others (especially funders). A data management plan gives a concise description of the who, what, where, and when of your data throughout its life cycle.

PAGE 28

Why do you need a DMP?For all the same reasons you should consider following data management best practices… • To ensure that your valuable data resources will be accessible in the future to members of your team and the broader research community. • To make your life easier –by planning ahead and documenting your data throughout its life cycle, you can save time and focus on your research. • To increase the visibility of your research. • To satisfy funders’ requirements.

PAGE 29

Components of a DMP• Project description • Data collection: – Types of data – Data and metadata standards to be used• Legal and ethical issues:– Privacy and confidentiality – Intellectual property rights• Policies for data sharing and re use • Data preservation (long term) • Who is responsible for data management

PAGE 30

Funders’ data requirements• National Science Foundation: http://www.nsf.gov/pubs/policydocs/pappguide/ns f11001/gpg_2.jsp#dmp • National Institutes of Health: http://grants.nih.gov/grants/policy/data_sharing/ • Centers for Disease Control: http://www.cdc.gov/od/foia/policies/sharing.htm • NASA Earth Science: http://science.nasa.gov/earth science/earth science data/data information policy/ • Environmental Protection Agency: http://www.epa.gov/quality/informationguidelines /documents/EPA_InfoQualityGuidelines.pdf

PAGE 31

DMP Templates and ToolsTemplates can give you a place to start, as long as you customize them for your project. • HPC Center links: http://www.hpc.ufl.edu/proposals/ • https://dmp.cdlib.org/

PAGE 35

On campus resourcesYour data partners: • Research Computing/HPC Center • UF Libraries– Institutional RepositoryOther data related resources: • Division of Sponsored Research • Integrated Data Repository • REDCap • Office of Technology Licensing • Information Security Office • Intellectual Property Policy

PAGE 36

http://guides.uflib.ufl.edu/datamanagement

PAGE 37

Feel free to contact us:• Hannah Norton nortonh@ufl.edu 273 8412 • Rolando Garcia Milian, rolando.milian@ufl.edu 273 8440 UF Health Science Center Libraries This presentation is available for re use under a creative commons attribution license.

PAGE 38

References• Data Documentation Initiative (DDI) version 3.0 Combined Life Cycle Model: http://www.ddialliance.org/what • Alex Ball. (2012). Review of Data Management Lifecycle Models (version 1.0). REDm MED Project Document redm1rep120110ab10. Bath, UK: University of Bath. Available at http://opus.bath.ac.uk/28587/1/redm1rep120110ab10.pdf • DeumensE, Taylor LNF, SchipperRA, BoteroC, Garcia Milian R, Norton HF, Tennant MR, Acord SK, Barnes CP. (2011). “Research Data Lifecycle Management: Tools and guidelines”, position paper, Workshop on Research Data Lifecycle Management. Available at http://www.columbia.edu/~rb2568/rdlm/Deumens_UF_RDLM2011.pdf • A. Collie. (2005). “NSB Long Lived Data Collections: Enabling Research and Education in the 21st Century.” Available at http://www.nsf.gov/pubs/2005/nsb0540/nsb0540.pdf • Texas Advanced Computing Center. (2012). “Writing a Data Management Plan: A guide for the perplexed.” Available at http://www.tacc.utexas.edu/c/document_library/get_file?uuid=e9774145 9801 4049 b324 a1b0d6e635ca&groupId=13601 • University of Wisconsin Research Data Services. Data Plan Essentials: http://researchdata.wisc.edu/make a plan/data plans/

PAGE 39

References• MIT Libraries. Data Management and Publishing: http://libraries.mit.edu/guides/subjects/data management/index.html • JA Lyon, N Ferree, H Norton, MR Tennant. “Electronic capture and analysis of librarian mediated literature searches in the health sciences”, contributed presentation, 6thEvidence Based Library and Information Practice conference, Sheffield, U.K., June 28, 2011. • Dublin Core Metadata Initiative: http://dublincore.org/ • Darwin Core Biodiversity Information Standards: http://rs.tdwg.org/dwc/ • Metadata Encoding & Transmission Standard: http://www.loc.gov/standards/mets/ • Federal Geographic Data Committee: http://www.fgdc.gov/ • ABCD Schema –Task Group on Access to Biological Collection Data: http://www.bgbm.org/tdwg/codata/schema/ • Astronomy Visualization Metadata Standard: http://www.jodcast.net/avm/microformat.html • Content Standard for Digital Geospatial Metadata: http://www.fgdc.gov/metadata/csdgm/