Phylotastic: Difference between revisions
Line 15: | Line 15: | ||
* [[Phylotastic/TNRS|TNRS]] | * [[Phylotastic/TNRS|TNRS]] | ||
* [https://docs.google.com/document/d/1XWaU808wJlAjrFixuy0ePdkgoDQC0qlpfmII5uzdHPc/edit Branch length group] | * [https://docs.google.com/document/d/1XWaU808wJlAjrFixuy0ePdkgoDQC0qlpfmII5uzdHPc/edit Branch length group] | ||
* [https://docs.google.com/document/d/1zj601OUQWqh5I-5weo9v3qjlzhNWeh1hunTYGBKfeLA/edit | * [[Phylotastic/Datastore Tree Store subgroup]], with [https://docs.google.com/document/d/1zj601OUQWqh5I-5weo9v3qjlzhNWeh1hunTYGBKfeLA/edit notes on Google Docs] | ||
In addition, there are separate pages that continue to be updated: | In addition, there are separate pages that continue to be updated: |
Revision as of 18:31, 8 October 2012
Warning
The Hackathon took place at NESCent June 4 to 8. This wiki, which was used as a central resource for pre-hackathon planning, is in a state of flux. Some parts reflect pre-hackathon brainstorming and are out of date. Other parts reflect outcomes of the hackathon. Probably the best place to go to see the current state of things is http://www.phylotastic.org .
Where to go
This is the public page for the Phylotastic hackathon (as distinct from the Leadership Team's planning page). Participants self-assembled into task groups to work on pieces of the project.
Currently, the best places to go for information are the sub-group pages here:
- Phylotastic/Architecture: Architecture and API for Phylotastic
- Phylotastic/shiny subgroup producing demos to showcase phylotastic capabilities
- TNRS
- Branch length group
- Phylotastic/Datastore Tree Store subgroup, with notes on Google Docs
In addition, there are separate pages that continue to be updated:
- Phylotastic/Participants - roster with photos (see also Phylotastic/Pictures)
- Phylotastic/Use Cases - use cases (ideally with data files and outputs for testing
- HIP Funding - ideas to get support for projects
- PhylotasticPromo - outreach and promotional strategy (web site, videos, social media)
- Phylotastic/Related Projects - List and discussion of how Phylotastic builds upon and connects to other projects
Some of this material, developed during pre-hackathon planning, is no longer relevant or has gone stale:
- Phylotastic/Schedule - hackathon event schedule
- PhylotasticSpec - for developing detailed specifications (includes scoping statements)
- Phylotastic/Datastore - for considering options for a Phylotastic data store
Projected tangible outcomes
The table below include tangible outcomes of the hackathon such as code repositories, live demos, specifications, and documentation.
Group | Description | Item (link) | Documentation (link) | NEAD | responsible person |
---|---|---|---|---|---|
all | manuscript (evol bioinfo?) | draft ms | NA | NA | Arlin |
all | iEvoBio talk | slides at slideshare | NA | yes | Karen, Arlin |
all | promo (screencast) | PhylotasticPromo | NA | NA | Rutger, Arlin |
all | swag - phylotastic t-shirts, anyone? | PhyloT Vote for Phylotastic | NA | no | Meg? |
arch | demo galaxy server | live demo and code (github) | base class and screencast | yes | Rutger |
arch | demo topology server | live demo and code on github | README.pod | yes | Rutger |
arch | extensions to phylomatic | github | NA | yes | Cam |
arch | prototype controller architecture in nodeJS | github project | [1] | no | Helena |
arch | prototype controller as Perl CGI script | https://github.com/phylotastic/cgi | README on github | yes | Ben |
arch | report: a reference architecture for phylotastic services | draft | NA | no | Helena |
branch | DateLife demo service to annotate tree with dates | http://datelife.org | NA | yes | Brian O. |
branch | iEvoBio challenge talk | YouTube video | NA | yes | Brian O. |
branch | Publication | a specialized journal | NA | NA | Brian O. |
shiny | demo for reconcile-tree use-case | live demo | NA | yes | Chris B. |
shiny | Mesquite-o-tastic demo module | Java code on github | screencast | yes | Arlin & Peter |
shiny | scripts to convert Goloboff tree from TNT | dir with perl code | POD within code | yes | Arlin |
shiny | 5 blogs about the event | blogspot | NA | no | Holly |
shiny | refinement of gene duplication inference algorithm implementation | dir with Java code | limited | no | Christian Z. |
TNRS | API specification | API | TNRS | yes | Naim |
TNRS | NCBI implementation of the API | github | NCBI | no | Siavash |
TNRS | MSW2 implementation of the API | github | MSW3 | no | Siavash |
TNRS | Demo server (TaxoSaurus) | Demo | TNRS | yes | Naim |
TNRS (treestore) | RDF model and ontology for TNRS requests and results | link to release | NA | yes | Hilmar |
treestore | New release of CDAO ontology adopting OBO conventions | link to release | NA | yes | Jim |
treestore | Prototype tree-pruning SADI service | Github | NA | yes | Jim |
treestore | Perl ingestor of Newick trees/TNRS connection | github | NA | no | Enrico |
treestore | PhyloWS REST wrapper around tree store | live demo | NA | no | Mark |
NA | NA | NA | NA | no | NA |
NA | NA | NA | NA | no | NA |
Background
A problem faced in many areas of life sciences research, from community ecology to comparative genomics to biomedical genetics, is to put the data available for a set of species into a phylogenetic context, based on a "species tree". For all we know, scientists are facing this type of problem hundreds of times every day. The past decade of efforts to assemble a large "tree of life", a phylogeny for all species, have produced many "megatrees" or "supertrees", usually limited to a particular group of organisms such as fungi, mammals or plants. Most scientists don't know how to use such huge trees. Yet, it ought to be possible to address the scientific demand for species trees by taking the existing supertrees, pruning away unneeded parts, and grafting on (where possible) missing species.
An existing tool called "phylomatic" does precisely this: starting with a user-supplied list of species and a huge phylogenetic topology for plant families, it grafts the species onto the tree wherever it can match the family name, and it prunes away all the rest. This is just a topology, so users find ways to add branch lengths to the resulting tree. The result is that the user, so long as she is only interested in plants, can get a phylogeny for an arbitrary list of named species. Phylomatic rocks: its frequent use shows that big species trees are highly useful for applications in ecology, biodiversity, & trait analysis,when the interfaces that serve user needs— and the mega tree providing vast coverage— are available.
This suggests that if a more general tool can be built, it will be extraordinarily useful, especially if
- it is an open standard that can be implemented in many ways
- the back-end data store is populated with large phylogenies available for fungi, fish, mammals, butterflies, etc (not just plants)
- the core functionality (name-matching, grafting & pruning) is modularized in open-source bioinfo toolboxes
- methods for adding branch lengths are easier and more generalized
- all of the above operations are wrapped up as web services that can be invoked from existing computing environments
If this were a web service, we could plug it into Mesquite, and users could load up their species-based character matrix, then get a tree for it. In fact, lets go back a step, to consider users with only a list of species, and no data to compare: consider an even more open-ended discovery environment, which we could implement in Galaxy or Taverna (given that this is all based on web services). The user starts with a list of species (or a higher taxon), and a request for some useful types of data that could be obtained by querying various available sources, e.g., whether it has a cyt oxidase sequence in GenBank, whether it is found in California, where is the nearest specimen, etc.
resources: software, references, tutorials, and other useful links
Add links to papers, websites, code, tutorials, etc that would help people get up to speed on any of the proposed tasks.
- about the Phylotastic project
- pruning and grafting
- Phylomatic web home
- Phylomatic: tree assembly for applied phylogenetics (PDF) by Webb & Donoghue, 2005
- Rutger's proof-of-concept uses map-reduce
- Phylogenetic Diversity within Seconds shows that pruning-grafting with 10^5 leaf nodes can be done in seconds.
- about phyloinformatics web services APIs
- Taxonomic Name Resolution
- ideas and resources for species-wise mashups
- Rod Page's http://ispecies.org creates an on-the-fly web page for a species based on info from NCBI, google scholar, etc
- standards for representing data
- NeXML: rich, extensible, and verifiable representation of comparative data and metadata describes an extensible XML format for comparative data
- adaptable viz environments
- adaptable workflow environments
after the hackathon
Opportunities right after the hackathon to build on the phylotastic momentum
- do a challenge project for Geneious, present it at iEvoBio
- develop slide presentation to accompany PhylotasticiEvoBio abstract for iEvoBio 2012
- do the iEvoBio challenge at iEvoBio
- work on Galaxy integration at a workshop
- ISMB in Long Beach, July 13: Bioinformatics Software Interoperability (BIS SIG) - approaches to interoperability, including Cytoscape, Galaxy, GenePattern, GenomeSpace, and others, including the opportunity to adapt tools to one of these environments in a hackathon session.(http://www.broadinstitute.org/bsi-sig/)
- go to Chicago, July 25 to 27 for the 2012 Galaxy Community Conference (GCC2012, http://galaxyproject.org/GCC2012).
Manuscript
Phylotastic Architecture
A draft design resulted from pre-hackathon planning. This was then completely overhauled and superseded by the results of the work of the architecture subgroup.