|
![]() |
|
| UFDC Home |
| Help | RSS
|
|
CITATION
PDF VIEWER
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Full Citation | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
STANDARD VIEW
MARC VIEW
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Downloads | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Full Text | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
PAGE 1 Data Lifecycle ManagementHannah Norton and Rolando Garcia Milian UF Health Science Center LibrariesIma g e credit: htt p : // www.flickr.com /p hotos / bl p rnt / 3642742876 / in /p hotostream / PAGE 2 Agenda The data lifecycle and data types Metadata and labeling your data Storage and preservation Data management planning Additional resources PAGE 3 Data Lifecycle** Based on Data Documentation Initiative (DDI) version 3.0 Combined Life Cycle Model Study Concept Data Collection Data Processing Data Distribution Data Archiving Data Discovery Data Analysis Repurposing Data Analysis PAGE 4 Data Lifecycle** Based on Data Documentation Initiative (DDI) version 3.0 Combined Life Cycle Model Study Concept Data Collection Data Processing Data Distribution Data Archiving Data Discovery Data Analysis Repurposing Data Analysis PAGE 5 Data generated throughout the lifecycle has different needs Raw data some must be kept forever, others can be discarded after the project is complete Intermediate data for analyzing and processing can be often be discarded at the end of the computation, but computational methods should be for reproducibility Final data should be made available indefinitely to the community PAGE 6 How long do you need your data stored? 18.4% 18.4% 42.9% 18.4% 2.0% 8.3% 12.5% 29.2% 43.8% 6.3% 22.0% 16.0% 42.0% 20.0% 0.0% 0%10%20%30%40%50% Forever More than 10 years 6 10 years 1 5 years Less than a year Percentage of Respondents Raw Data (n=50) Intermediate/Working Data (n=48) Processed Data (n=49) PAGE 7 Data types and reproducibility Reproducibility is a key parameter in determining the need for long term preservation of data: Stable (S) : Derives from simulations, reductions, measurements Ephemeral (E) : Cannot be reproduced or reconstructed as it is time sensitive Costly (C) : Stable but costly to regenerateModified from: http://tinyurl.com/724hezs PAGE 8 Data types and reproducibility Experimental data(S, E, C): from labs and equipment Observational data (E): captured in real time Derived data (S, E, C): after data mining and statistical processing Simulation data (S, E, C): data generated from modeling processes Software (S, E, C) Modified from: http://tinyurl.com/724hezs PAGE 9 Metadata/annotation must be added throughout the lifecycle Study Concept Data Collection Data Processing Data Distribution Data Archiving Data Discovery Data Analysis Repurposing Data Analysis Metadata PAGE 10 What exactly is metadata again? Descriptive information that helps you and others understand your data Data about data that acts as a surrogate for your datawhen you or others are trying to: Find the data later Know what the data is later Share the data later PAGE 11 How are your data labeled or annotated? 17.3% 21.2% 78.8% 32.7% 0%20%40%60%80% My data are not annotated. Referentially, with an associated codebook Manually, by a member of my research team Automatically, through data collection tool Percentage of Respondents n=52 PAGE 12 Metadata across the disciplinesBasic information to keep: Descriptive What is it about? Title, time, author, keywords Relations to other data objects Administrative Ownership and use permissions Provenance Where does it come from? History of changes to the data, versions More specific information varies by discipline PAGE 13 Standards Where possible, use standard data formats and metadata formats. Saves you time, saves the data users time. The tricky part is finding the right standard. Ontologies and controlled vocabularies can also help standardize the contents of your metadata and make it easier to understand. PAGE 14 Sample metadata standards Dublin Core Darwin Core METS (Metadata Encoding and Transmission Standards) FGDC (Federal Geographic Data Committee) DDI (Data Documentation Initiative) ABCD (Access to Biological Collections Data) AVMS (Astronomy Visualization Metadata Standard) CSDGM (Content Standard for Digital Geospatial Metadata) PAGE 15 From: http://www.ncecho.org/dig/ncdc2007.shtml#6 PAGE 16 PAGE 17 PAGE 18 Sample ontologies/ controlled vocabularies LCSH (Library of Congress Subject Headings) MeSH(Medical Subject Headings) GeneOntology Plant Ontology International Standard Classification of Education NASA Thesaurus Multilingual Thesaurus of the Geosciences SNOMED Clinical Terms PAGE 19 PAGE 20 Source: http://xkcd.com/927/ PAGE 21 Finding a home for your data Data storage, both short term and long term, can take place in 3 types of places: Locally, within the lab or research environment Within the institution Within a national/discipline based repository PAGE 22 How do you store your data? 9.6% 7.7% 1.9% 30.8% 78.8% 17.3% 34.6% 38.5% 0%20%40%60%80% Other Discipline specific database, e.g. NCBI (National Center for Biotechnology Information) Professional organization/association storage (e.g. ICPSR, available with published findings) Institutional storage College or departmental computer network Online (e.g. Drop Box/Google docs/Amazon cloud) External hard drive/CDs/DVDs Personal laptop/desktop Percentage of Respondents n=52 PAGE 23 How are you sharing or planning to share your data? 10.0% 46.0% 22.0% 4.0% 68.0% 26.0% 0%10%20%30%40%50%60%70% I do not share data Making them available informally to peers on request Making them available online via a project or institutional website Depositing them in UFs Institutional Repository (http://ufdc.ufl.edu/ir) Submitting them to a journal to support a publication Depositing them in a discipline specific data center or repository Percentage of Respondents n=50 PAGE 24 RepositoriesAdvantages of an institutional repository Linked to your institution intellectual capital of the institution in one place You can put all your datasets together Some guarantee of support from the university Some domain repositories may go out of business once their funding ends Advantages of a domain repository Your data will stored with similar datasets Researchers will find your data easily The repository will understand what your data needs in terms of storage, archiving and preservation Computational tools may be developed to crunch a critical mass of data of a certain kindAdapted from: http://libraries.mit.edu/guides/subjects/data management/Managing%20Res earch%20Dat a%20101.pdf PAGE 25 What about non digital data?Consider migrating to digital PAGE 26 Data Management Plans describe the whole data lifecycle. Study Concept Data Collection Data Processing Data Distribution Data Archiving Data Discovery Data Analysis Repurposing Data Analysis PAGE 27 What is a data management plan (DMP)? A clear description of how you plan to address data management issues in your research. A way to communicate your data management efforts to members of your team and others (especially funders). A data management plan gives a concise description of the who, what, where, and when of your data throughout its life cycle. PAGE 28 Why do you need a DMP?For all the same reasons you should consider following data management best practices To ensure that your valuable data resources will be accessible in the future to members of your team and the broader research community. To make your life easier by planning ahead and documenting your data throughout its life cycle, you can save time and focus on your research. To increase the visibility of your research. To satisfy funders requirements. PAGE 29 Components of a DMP Project description Data collection: Types of data Data and metadata standards to be used Legal and ethical issues: Privacy and confidentiality Intellectual property rights Policies for data sharing and re use Data preservation (long term) Who is responsible for data management PAGE 30 Funders data requirements National Science Foundation: http://www.nsf.gov/pubs/policydocs/pappguide/ns f11001/gpg_2.jsp#dmp National Institutes of Health: http://grants.nih.gov/grants/policy/data_sharing/ Centers for Disease Control: http://www.cdc.gov/od/foia/policies/sharing.htm NASA Earth Science: http://science.nasa.gov/earth science/earth science data/data information policy/ Environmental Protection Agency: http://www.epa.gov/quality/informationguidelines /documents/EPA_InfoQualityGuidelines.pdf PAGE 31 DMP Templates and ToolsTemplates can give you a place to start, as long as you customize them for your project. HPC Center links: http://www.hpc.ufl.edu/proposals/ https://dmp.cdlib.org/ PAGE 35 On campus resourcesYour data partners: Research Computing/HPC Center UF Libraries Institutional RepositoryOther data related resources: Division of Sponsored Research Integrated Data Repository REDCap Office of Technology Licensing Information Security Office Intellectual Property Policy PAGE 36 http://guides.uflib.ufl.edu/datamanagement PAGE 37 Feel free to contact us: Hannah Norton nortonh@ufl.edu 273 8412 Rolando Garcia Milian, rolando.milian@ufl.edu 273 8440 UF Health Science Center Libraries This presentation is available for re use under a creative commons attribution license. PAGE 38 References Data Documentation Initiative (DDI) version 3.0 Combined Life Cycle Model: http://www.ddialliance.org/what Alex Ball. (2012). Review of Data Management Lifecycle Models (version 1.0). REDm MED Project Document redm1rep120110ab10. Bath, UK: University of Bath. Available at http://opus.bath.ac.uk/28587/1/redm1rep120110ab10.pdf DeumensE, Taylor LNF, SchipperRA, BoteroC, Garcia Milian R, Norton HF, Tennant MR, Acord SK, Barnes CP. (2011). Research Data Lifecycle Management: Tools and guidelines, position paper, Workshop on Research Data Lifecycle Management. Available at http://www.columbia.edu/~rb2568/rdlm/Deumens_UF_RDLM2011.pdf A. Collie. (2005). NSB Long Lived Data Collections: Enabling Research and Education in the 21st Century. Available at http://www.nsf.gov/pubs/2005/nsb0540/nsb0540.pdf Texas Advanced Computing Center. (2012). Writing a Data Management Plan: A guide for the perplexed. Available at http://www.tacc.utexas.edu/c/document_library/get_file?uuid=e9774145 9801 4049 b324 a1b0d6e635ca&groupId=13601 University of Wisconsin Research Data Services. Data Plan Essentials: http://researchdata.wisc.edu/make a plan/data plans/ PAGE 39 References MIT Libraries. Data Management and Publishing: http://libraries.mit.edu/guides/subjects/data management/index.html JA Lyon, N Ferree, H Norton, MR Tennant. Electronic capture and analysis of librarian mediated literature searches in the health sciences, contributed presentation, 6thEvidence Based Library and Information Practice conference, Sheffield, U.K., June 28, 2011. Dublin Core Metadata Initiative: http://dublincore.org/ Darwin Core Biodiversity Information Standards: http://rs.tdwg.org/dwc/ Metadata Encoding & Transmission Standard: http://www.loc.gov/standards/mets/ Federal Geographic Data Committee: http://www.fgdc.gov/ ABCD Schema Task Group on Access to Biological Collection Data: http://www.bgbm.org/tdwg/codata/schema/ Astronomy Visualization Metadata Standard: http://www.jodcast.net/avm/microformat.html Content Standard for Digital Geospatial Metadata: http://www.fgdc.gov/metadata/csdgm/ | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| MILLISECOND | CLASS.METHOD | MESSAGE |
|---|---|---|
| 0 | sobekcm_page_globals.constructor | |
| 0 | sobekcm_page_globals.constructor | Application State validated or built |
| 0 | sobekcm_database.verify_item_lookup_object | |
| 0 | sobekcm_page_globals.constructor | Navigation Object created from URI query string |
| 0 | sobekcm_database.verify_item_lookup_object | |
| 0 | sobekcm_page_globals.display_item | Retrieving item or group information |
| 0 | sobekcm_page_globals.get_entire_collection_hierarchy | Retrieving hierarchy information |
| 0 | sobekcm_assistant.get_entire_collection_hierarchy | |
| 0 | cached_data_manager.retrieve_item_aggregation | |
| 0 | cached_data_manager.retrieve_item_aggregation | Found item aggregation on local cache |
| 0 | item_aggregation_builder.get_item_aggregation | Found 'all' item aggregation in cache |
| 0 | system.web.ui.page.page_load (ufdc.page_load) | |
| 0 | sobekcm_page_globals.constructor.on_page_load | |
| 0 | html_echo_mainwriter.add_style_references | Adding style references to HTML |
| 0 | html_echo_mainwriter.add_text_to_page | Reading the text from the file and echoing back to the output stream |
| 58 | html_echo_mainwriter.add_text_to_page | Finished reading and writing the file |