PAGE 1
VIVO: Sharing Data for Research Discovery Mike Conlon University of Florida mconlon@ufl.edu
PAGE 2
Public, structured linked data about investigators interests, activities and accomplishments, and tools to use that data to advance science
PAGE 4
VIVO Searchlight
PAGE 7
Data Production
PAGE 8
Producing Data
PAGE 9
processOrg < function( uri ){ x< xmlParse ( uri ) u< NULL name< xmlValue ( getNodeSet (x,"// rdfs:label ")[[1]] ) subs< getNodeSet (x,"//j.1:hasSubOrganization") if(length(subs)==0) list(name= name,subs =NULL) else { for( i in 1:length(subs)){ sub.uri< getURI ( xmlAttrs (subs[[ i ]])["resource"]) u< c( u,processOrg (sub.uri)) } list(name= name,subs =u) } } VIVO produces human and machine readable formats Software reads RDF from VIVO and displays
PAGE 10
Data Sharing Photograph by J. G. Park. Flicker.com Photograph by Ell Brown Flicker.com
PAGE 11
Information is stored using the Resource Description Framework (RDF) as subject predicate Jane Smith professor in author of has affiliation with Dept. of Genetics College of Medicine Journal article Book chapter Book Genetics Institute Subject Predicate Object A Web of Data The Semantic Web
PAGE 13
The Role of the Archive Collate data, final semantics, ready for consumption
PAGE 14
Institutions record activities, interests, accomplishments
PAGE 15
Data, Tools and Scientists
PAGE 16
Data Consumption Photograph by ScoopMedia. Flicker.com Photograph by Janet Tarbox. Flicker.com
PAGE 17
A Consumption S cenario Find all faculty members whose genetic work is implicated in breast cancer VIVO will store information about faculty and associate to genes. Diseaseome associates genes to diseases. Query resolves across VIVO and data sources it links to.
PAGE 18
Data Reasoning Data integration continues to be a serious bottleneck for the expectations of increased productivity in the pharmaceutical and biotechnology domain relationships between gene, protein, interaction, pathway, target, drug, disease and patient and currently consist of more than 5 billion RDF statements. The dataset interconnects more than 20 complete data sources and previously unrelated data from heterogeneous knowledge. From the LarKC (Large Knowledge Collider) http ://www.larkc.eu/overview/
PAGE 19
http://vivo.ufl.edu/individual/mconlon
|