<%BANNER%>

Data management 101: General Guidelines for Effective Data Management

University of Florida Institutional Repository UFHSC
Permanent Link: http://ufdc.ufl.edu/IR00000802/00001

Material Information

Title: Data management 101: General Guidelines for Effective Data Management
Physical Description: Slide Presentation
Creator: Garcia-Milian, Rolando
Norton, Hannah F.
Publication Date: 2012

Notes

Abstract: This presentation provides guidelines for research data management including data management plans, file formatting, backup, data repositories, metadata standards. More on this topic can be found at http://guides.uflib.ufl.edu/aecontent.php?pid=326281
Acquisition: Collected for University of Florida's Institutional Repository by the UFIR Self-Submittal tool. Submitted by Rolando Milian.
Publication Status: Unpublished
General Note: Presented at University of Florida Research Computing Day. April 25, 2012
Funding: This project has been funded in part with federal funds from the National Library of Medicine, National Institutes of Health, under Contract # HHS-N-276-2011-00004-C.

Record Information

Source Institution: University of Florida Institutional Repository
Holding Location: University of Florida
Rights Management: All rights reserved by the submitter.
System ID: IR00000802:00001

Permanent Link: http://ufdc.ufl.edu/IR00000802/00001

Material Information

Title: Data management 101: General Guidelines for Effective Data Management
Physical Description: Slide Presentation
Creator: Garcia-Milian, Rolando
Norton, Hannah F.
Publication Date: 2012

Notes

Abstract: This presentation provides guidelines for research data management including data management plans, file formatting, backup, data repositories, metadata standards. More on this topic can be found at http://guides.uflib.ufl.edu/aecontent.php?pid=326281
Acquisition: Collected for University of Florida's Institutional Repository by the UFIR Self-Submittal tool. Submitted by Rolando Milian.
Publication Status: Unpublished
General Note: Presented at University of Florida Research Computing Day. April 25, 2012
Funding: This project has been funded in part with federal funds from the National Library of Medicine, National Institutes of Health, under Contract # HHS-N-276-2011-00004-C.

Record Information

Source Institution: University of Florida Institutional Repository
Holding Location: University of Florida
Rights Management: All rights reserved by the submitter.
System ID: IR00000802:00001


This item is only available as the following downloads:


Full Text

PAGE 1

Data management 101Rolando Garcia-Milianand Hannah Norton UF Health Sciences Center LibraryGeneral Guidelines for Effective Data Management UF Research Computing Day -April 25, 2012 Rolando.milian@ufl.edu / nortonh@ufl.edu

PAGE 2

General Guidelines / Best Practices-Planning (DMP –Norton’s presentation) -Metadata -Formatting -Storing -Security -Copyright -Sharing

PAGE 3

Benefits of proper data management-Data is evidence supporting/refuting models in science -Efficient use of resources -Effective protection -Preservation and re-use through data sharing and collaboration -High qualityresults -Research excellence -Advancing science

PAGE 4

Challenges of data management-Planning -Organization -Documenting -Formatting -Submitting -Answer questions? -Data errors/mistakes? -Being scooped? -Public resistance?

PAGE 5

Tools for data management

PAGE 6

Results of poor data management From: Horner J., and MinifieF.D. 2011 Research Ethi cs II: Mentoring, Collaboration, Peer Review, and Data. Journal of Speech, Language, and Hearing Research 54: S330–S345

PAGE 7

Metadata Annotation Documenting

PAGE 8

Metadata (Annotation/ Documenting) Metadata Information about data: the information required to understand data, context, quality, structure, and accessibility (Michener et al., 1997) -Who, what, when, where, and how about every aspect of the data.

PAGE 9

Metadata (Annotation/ Documenting)Benefits of proper metadata-Reuse and data sharing are facilitated -Data discovery -Expand the scale of study -Addresses unanticipated questions -Integrate data http://www.flickr.com/photos/boojee/3743753784/in/phot ostream/

PAGE 10

Metadata (Annotation/ Documenting)Use standardized taxonomies and controlled vocabularies including domain, national, and international standards in the capture, management and archiving of data.

PAGE 11

Metadata (Annotation/ Documenting) Automatic addition of metadata -Some is automatically added during the data collectionor analysis process-i.e. date, time -Some software (e.g. R statistical package, MATLAB, SAS, Galaxy) provide analysis scripts records of the various steps involved in processing and analyzing data, and provide a form of “analytical metadata.” always leave record of what you did with your data,

PAGE 12

Metadata (Annotation/ Documenting) User interface-driven analysis -changes to data are made by selecting steps from drop-down menus, followed by a “run” or “execute” or “ok” button rarely leave a clear accounting of exactly what you have done

PAGE 13

Metadata (Annotation/ Documenting)Manually added metadataAbout the project -Title, people, key dates, funders and grants About the data -Title, key dates, creator(s), subjects, rights, included files, format(s), versions -Interpretive aids: codebooks, data dictionaries, algorithms, code

PAGE 14

Metadata (Annotation/ Documenting) Keep a READMEfile for each data file -Plain text files -Short description of what data it includes -Who collected the data and whom to contact with questions -Column headings for any tabular data -Units of measurement used -Symbols used -Specialized formats or abbreviations used http://datadryad.org/handle/10255/dryad.8525

PAGE 15

Formatting Your Data http://www.ehow.co.uk/how_8510149_mak e-excel-spreadsheets-look-good.htm l

PAGE 16

Formatting Your Data File formats in which data is created depend on: -Software in which research data are created and digitized -How researchers plan to analyze data -Hardware used -Availability of software -Discipline-specific

PAGE 17

Formatting Your Data Organizing Files and Folders: -Essential for accessibility -Makes it easier to find an d keep track of data files. -Develop a system that works for your project -Be consistent http://jdorganizer.blogspot.com/2008 /03/file-folders-declare-that-youare.html

PAGE 18

Formatting Your Data File names: -Use file names to classify broad types of files -Create meaningful but brief names “Year01” or “Fall03” vs “Corvallis_VegBiodiv_2007” -Capitalize each word to differentiate it. -Avoid using special characters in a file name. \/ : ? “ < > | [ ] & $

PAGE 19

Formatting Your Data File names: -Use underscore or hyphen symbols instead of spaces “_” or “-“ -Capture place, time, and theme –extremely useful, even if done in a highly abbreviated manner -Reverse dates so they sort usefully YYYYMMDD e.g. filenaming_20080507 -Capture document version control v01, v02, v03 instead of filenaming_lastestversion

PAGE 20

Formatting Your Data for Storage Store data in nonproprietary software formats (e.g., comma delimited text file, .csv); proprietary software (e.g., Excel, Access)may become unavailable, whereas text files can always be readNOTE: When data are converted from one format to another, certain changes may occur to the data. After conversions, data should be checked for errors or changes that may be caused bythis process

PAGE 21

Formatting Your Data for Storage Textual Formats File Extensions Acrobat PDF/A .pdf Comma-Separated Values .csv Open Office Formats .odt, .ods, .odp Plain Text (US-ASCII, UTF-8) .txt XML .xml Image/Graphic Formats JPEG .jpg JPEG2000 .jp2 PNG .png SVG 1.1 (no Java binding) .svg TIFF .tif, .tiff Audio Formats AIFF .aif, .aiff WAVE .wav Video Formats AVI (uncompressed) .avi Motion JPEG2000 .mj2, .mjp2 Recommended File Formats for PreservationRecommended File Formats for Preservation. University of Texas http://repositories.lib.utexas .edu/recommended_file_formats

PAGE 22

Storing Your Data http://blog.brickhousesecurity.com/wpcontent/uploads/mystica_usb_flash_drive.png

PAGE 23

Storing Your Data -Store data in nonproprietary hardware formatsFormats can rapidly become obsolete valuable data that are essentially lost because they are trapped on old formats, 5.25” floppy disks CD/DVD experiential life expectancy is 2 to 5 years even though published life expectancies are often cited as 10 years, 25 years, or longer Manufacturers claim that CD-R and DVD-R discs have a shelf life of 5 to 10 years before recording on them (U.S. National Archives)

PAGE 24

Storing Your Data Always store an uncorrected (the original data set) data file version or master version : -Do not make any corrections to this file -Make corrections using a scripted language. -Consider making your original data file readonly -Limit access to this file

PAGE 25

Storing Your Data -Whenever possible, use online storage (i.e. Dropbox) or institutional resources http://www.hpc.ufl.edu/about/newStorage.php

PAGE 26

Storing Your Data Regular back-ups protect against accidental data loss:-hardware failure -software or media faults -virus infection or malicious hacking -power failure -human errorsEnsure that areas and rooms for data storage are structurally sound, and free from the risk of flood and fire http://www.mathworks.com/matlabcentral/fileexcha nge/25464-virtual-backup-using-matlab

PAGE 27

Data Security http://www.icc-service.net/wpcontent/uploads/2010/07/data-storage.jpg

PAGE 28

SecurityUF IT Data Security Standard http://www.it.ufl.edu/policies/security/uf-it-sec-data.html Unrestricted Data If available to the public, will not harm an individual, group, or institution Sensitive Data If available to unauthorized users, may harm an individual, a group or institution Restricted Data Highest level of protection: i.e. Patient data, student data, security-related data such as passwords and risk assessments, and intellectual property

PAGE 29

SecurityDATA SECURITY AND ACCESS-Physical securityhttp://www.icc-service.net/wpcontent/uploads/2010/07/data-storage.jpg http://mrcheckout.net/wpcontent/uploads/2010/11/datasecurity.jpg -Security of computer systems and files -Network security

PAGE 30

Security When working with Restricted Data AVOID: -Storing data on workstations, portable devices or removable media. -Sending data in email or instant messages. -Using data on unapproved web sites. -Removing data from UF premises.Modified from Bergsma K. UF restricted data required training http://infosec.ufl.edu/restricte d-data/data-security-slides.pdf

PAGE 31

Security 392-2061 ufirt@ufl.edu http://infosec.ufl.edu/

PAGE 32

Security UF Privacy Office Susan Blair, Chief Privacy Officer Office phone: 392-2094 Privacy Hotline: 866-8764472 Email: privacy@ufl.edu Web: http://privacy.ufl.edu/

PAGE 33

DATA DISPOSALSecurityFor hard drives, simply deleting does not erase a file on most systems. Files need to be overwritten to ensure they are effectively scrambled External hard drives at the end of their life can be removed from their casings and disposed of securely through physical destruction Shredders certified to an appropriate security level should be used for destroying paper and CD/DVD discs http://www.spectrumdatarecovery.com.au/content.asp x?cid=23&m=3 Contact your IT person

PAGE 34

Security http://infosec.ufl.edu/restricted-data/data-security-slides.pdf

PAGE 35

Copyright http://blog.unl.edu/dixon/files/2012/01/copyright.jpg

PAGE 36

CopyrightIn the case of collaborative research, copyright may be held jointly by various researchers or institutions. Secondary users of data must obtain copyright clearance from the rights holder before data can be reproduced Give credit to the data source used, the data distributor and the copyright holder Data can be copied for non-commercial teaching or research purposes without infringing copyright, under the fair dealing concept, providing that the owner of the data is acknowledged

PAGE 37

Copyright UF Intellectual Property Policy http://www.research.ufl.edu/otl/pdf/ipp.pdf UF Office of Technology Licensing http://www.research.ufl.edu/otl/index.html Christine Ross –Copyright on Campus http://guides.uflib.ufl.edu/copyright

PAGE 38

Sharing Your Data http://www.amazon.com/Sharing-Toddler-Tools-ElizabethVerdick/dp/1575423146/ref=sr_1_1 ?s=books&ie=UTF8&qid=1335134736&s r=1-1

PAGE 39

Sharing Your DataWHY SHARE RESEARCH DATA-Encourage scientific debate -Promotes potential newuses of data -New collaborations -Improvement and validation of research methods -Increases impact and visibility of research -Promotes the research study and its outcomes -Required by journals/funding agencies -Provide direct credit to the researcher

PAGE 40

Sharing Your Datahttp://www.amazon.com/Sharing-Toddler-Tools-ElizabethVerdick/dp/1575423146/ref=sr_1_1?s =books&ie=UTF8&qid=1335134736&sr =1-1

PAGE 41

Sharing Your DataHOW TO SHARE YOUR RESEARCH DATA-Depositing with a specialist or discipline-specific datarepository -Submitting to a journal to support a publication -Depositing in an institutional repository -Available online via a project or institutional website -Available informally between researchers on a peer-to-peer basis

PAGE 42

Sharing Your Data A comprehensive list of data repositories by disciplines http://oad.simmons.edu/oadwiki/Data_repositories

PAGE 43

Sharing Your Data

PAGE 44

Sharing Your Data Advantages of depositing data with adata repository -Assurance that data meet set quality standards -Safe-keeping of data in a secure environment with the ability to control access where required -Standardized citation mechanism to acknowledge data -Promotion of data to many users -Online resource discovery of data through data catalogues -Monitoring of the secondary usage of data

PAGE 45

http://www.ithenticate.com/Portals/92785/ images/researcher-science-plagiarism.jpg UF Health Sciences Center Library UF Office of Technology Licensing UF High Performance Computing Center UF Institutional Repository UF Intellectual Property Policy UF Information Security Office

PAGE 46

ReferencesBergsma K. UF Restricted Data Required Training. Slide presentation. Available at http://infosec.ufl.edu/restricted-data/data-securityslides.pdf Borer E.T., SeabloomE.W., Jones M.B., and SchildhauerM. 2009. Some simple guidelines for effective data management. Bulletin of the Ecological Society of America 205-214 Data Repositories. http://oad.simmons.edu/oadwiki/Data_repositories Frequently Asked Questions (FAQs) about Optical Storage Media: Storing Temporary Records on CDs and DVDs. Record managers. U.S. National Archives http://www.archives.gov/recordsmgmt/initiatives/temp-opmedia-faq.html Horner J., and MinifieF.D. 2011 Research Ethics II: Mentoring, Collaboration, Peer Review, and Data. Journal of Speech, Language, and Hearing Research 54: S330–S345

PAGE 47

ReferencesJones, S., Ross, S., and Ruusalepp, R., Data Audit Framework Methodology, draft for discussion, version 1.8, (Glasgow, HATII, May 2009) Kruse R.L., and MehrD.R. 2008. Data management for prospective research studies using SAS Software. BMC Medical Research Methodology 8: 61Michener W.K., Brunt J.W., HellyJ., Kirchner T.B., Stafford S.G. 1997 Non-geospatial metadata for the ecological sciences. Ecological Applications 7: 330–342 Michener, W.K. 2006 Meta-information concepts for ecological data management. Ecological Informatics 1 (1): 3–7 North Carolina Gov. RecodBranch-Best practices for file-naming www.records.ncdcr.gov/erecords/filenaming_20080508_final.pdf

PAGE 48

ReferencesRecommended file formats for long-term preservation. University of Texas http://repositories.lib.utexas.edu/recommended_file_formats Savage J.C., Vickers A.J. 2009 Empirical Study of Data Sharing by Authors Publishing in PLoSJournals PLoSONE 4(9): e7078. doi:10.1371/journal.pone.0007078UK -Joint. Info. Sys. Comm.-Choosing a file name www.jiscdigitalmedia.ac.uk/crossmedia/advice/choosing-a-file-name University of Edinburgh Records Management Section, Standard Naming Conventions For Electronic Records: The Rules, www.recordsmanagement.ed.ac.uk/InfoStaff/RMstaff/RMprojects/PP/ FileNameRules/Rules.htm Van den EyndenV., CortiL., Woollard M., Bishop, L., Horton L. 2011 Managing and sharing data. Best practice for researchers. University of Essex, U.K. http://www.dataarchive.ac.uk/media/2894/managingsharing.pdf