Main Page
Who we are
The EvoIO collaboration emerged in 2009 from several NESCent-sponsored activities that focused on software and data interoperability for evolutionary analysis, including the Evolutionary Informatics working group (2006-2009), and the Evolutionary Database Interoperability hackathon (2009). EvoIO aims to be a nucleating center for developing, applying and disseminating interoperability technology that connects and coordinates between stakeholders, developers, and standards bodies.
Members of the EvoIO group, which include biologists and computer scientists, have over the past 3 years harnessed a variety of collaborative events to successfully build an initial stack of interoperability technologies that is owned by the community and open to participation:
- NeXML, a NEXUS-inspired XML format that is validatable yet extensible
- CDAO, an ontology of comparative data analysis formalizing the semantics of evolutionary data and metadata; and
- PhyloWS, a web-services interface standard for querying, retrieving, and referencing phylogenetic data on the web.
For more information about how to get involved, read on.
What's happening
Participants in EvoIO activities in various ways participated in the iEvoBio satellite conference at Evolution 2010 in Portland, OR:
- Rutger Vos gave a talk on "TreeBase2: Rise of the machines" (Rutger Vos, Hilmar Lapp, Bill Piel, Val Tannen)
- Brandon Chisham presented "CDAO-Store: A New Vision for Data Integration" (Brandon Chisham, Trung Le, Enrico Pontelli, Tran Son, Ben Wright)
- Arlin Stoltzfus presented "EvoIO: Community-driven standards for sustainable interoperability" (Arlin Stoltzfus, Nico Cellinese, Karen Cranston, Hilmar Lapp, Sheldon McKay, Enrico Pontelli, Rutger Vos)
The EvoIO group staged a successful Phyloinformatics VoCamp November 7-11, 2009 in Montpellier, France, co-located with the annual meeting of the International Biodiversity Information Standards Organization (TDWG). A VoCamp is a hands-on meeting for investigators to create and develop ontologies and lightweight vocabularies in support of data integration and re-use-- in this case, the integration and re-use of phylogenetic trees and associated data and metadata. More information at VoCamp1.
How to get involved
How to get involved? Each stack component has an open community of developers and a mailing list. Click on the links below to find out more. You can sign up for the mailing list and start to contribute.
- NeXML's mailing list is nexml-discuss@lists.sourceforge.net (contact Rutger Vos, <firstname>aldo@gmail.com)
- PhyloWS's group is phylows@googlegroups.com (contact Hilmar Lapp, <firstinitial>lapp@nescent.org)
- CDAO's mailing list is cdao-discuss@lists.sourceforge.net (contact Arlin Stoltzfus, <firstname>@umd.edu)
Also EvoIO folks are looking for interoperability “targets”, which can be either single projects that want to make resources more interoperable, or pairs of resources that want to integrate. Contact any of us if you have an idea.
TolWeb2 ?
Based on some meetings in the winter, we developed a whitepaper calling for a meeting of ToLWeb stakeholders (contributors, research users, educational users, linked data providers) to develop a vision for ToLWeb2. This meeting would be followed by a process to develop a concrete plan and a proposal for funding. Stay tuned for further updates.
The EvoIO INTEROP project
Background
Over several years a variety of people, including NESCent's informatics staff, NESCent's Evolutionary Informatics working group, and the participants in the recent Evolutionary Database Interoperability hackathon laid the foundation that put us in a position to apply to the NSF INTEROP program. This program provides up to 250 K per year to support a data interoperability network. The network should be multidisciplinary; the network proposal should have a community aspect and a technology aspect. The deadline for this program in 2009 was July 23.
What makes us competitive:
- our past success in developing interop technologies nexml, CDAO and PhyloWS
- the 3-part interop formula of data syntax (nexml), semantics (CDAO) and services (phyloWS)
- our past success in actual demonstration projects that show off interop technology
- our demonstrated commitment to including diverse projects
- our connections with a network of researchers, programmers, and data providers
In light of this, we developed a proposal for a data interoperability network focused on trees and associated data and metadata. Two key features of the proposal are the use of hackathons, and the use of the "EvoIO Stack" (NeXML, CDAO, PhyloWS) as a technological nucleus for growing an Interop network. The proposal, the project summary and description of which are given below, is currently (as of Oct 2009) under review.
The EvoIO NSF Interop Proposal
Project Summary
INTEROP: A network for enabling community-driven standards to link evolution into the global web of data (EvoIO)
PI: Arlin Stoltzfus, Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute (CDAO, Bio::NEXUS, Nexplorer). Co-PIs: Karen Cranston, EOL and Field Museum of Natural History; Enrico Pontelli, New Mexico State University, Computer Science (CDAO); Sheldon McKay, Cold Spring Harbor Laboratory (GMOD, modENCODE, iPlant), Hilmar Lapp, NESCent (PhyloWS, BioSQL); Nico Cellinese, University of Florida, Florida Museum of Natural History (TOLKIN, RegNum).
Intellectual Merit. Evolutionary trees (“phylogenies”) organize knowledge of biodiversity and provide a framework for rigorous methods of comparative analysis used throughout the biosciences. In the past, the scope of a tree-based analysis was limited to the researcher’s “own” small data set. The great mass of currently available data makes possible far-reaching and systematic analyses, but only if trees (and associated data and metadata) can be accessed, searched, retrieved, and repurposed— that is, only if the data are interoperable. An integrated solution to this problem requires attention to the syntax and semantics of data, metadata, and services. Over the past 3 years, an “Evolutionary Informatics” working group funded by NESCent (an NSF Center) developed an interoperability “stack” consisting of NeXML (a file format for comparative data), the Comparative Data Analysis Ontology (CDAO) and PhyloWS (a web services standard). Recently, the group staged a “hackathon” that engaged a fresh group of researcher-programmers (chosen to represent community data resources) to learn, apply, and extend the EvoIO Stack, with results that show the remarkable promise of this approach to train early-career scientists, disseminate standards, and improve interoperability. The investigators will build on this approach and on their unique technology and experience to engage a larger community in improving interoperability of trees with associated data and metadata (e.g., taxonomic affiliations, sources, character data, etc). The EvoIO Network will organize hackathons, hold training workshops, host working groups, and implement infrastructure for community-building around emerging standards. Network staff will provide technical expertise in knowledge representation and bioinformatics, working to support standards and to build reference implementations. The resulting EvoIO community will extend broadly into systematics-biodiversity, comparative genomics, and phylogenetics, and will penetrate into key areas of community ecology, phylogenetic epidemiology and paleobiology.
Broader impacts. The research areas affected by this proposal— all those areas in which phylogenetic trees are used routinely— are diverse and currently are not unified by professional organizations, software platforms, or standards. By bringing together scientists from various disciplines, we will develop awareness of the need for standards, cohesion around preferred approaches to interoperability, and ultimately a broad consensus on specific standards. This will be accomplished by building on the momentum of work done under prior NSF funding via NESCent. The key to developing a cohesive community in the absence of pre- existing cohesion is the hackathon mechanism, which generates success stories and arms young custom essay writing researcher-programmers with the know-how to create further successes. Through this mechanism, user requirements will be translated into standards and specifications, and implemented in community software tools. Reference Implementations (developed concurrently with standards and specifications) will be used to aid in standards development and training. Hackathons will take place in eastern, western, and central locations to maximize diversity in impact, and will include strategically selected participants as well as a large fraction of participants chosen in response to a broad solicitation in the biodiversity, systematics, genomics, and phylogenetics communities. Standards and specifications developed by the Network will be disseminated via the relevant international standards group (the TDWG Phylogenetics Standard Interest Group). Efforts will be made to integrate ideas from this project into existing educational and outreach programs, with particular focus on involving students from NMSU (a minority-serving institution).
Project Description
The Project Description is available as a PDF.