Phylotastic: Difference between revisions

Revision as of 20:52, 8 April 2012

This is the public page for the Phylotastic hackathon (as distinct from the Leadership Team's planning page).

Where to go

This is the main page for pre-hackathon planning, including

participant info
agenda development
overview of phylotastic plan

There are other pages for specific topics:

PhylotasticUseCases - use cases (ideally with data files and outputs for testing
PhylotasticSandbox - for demos, mock-ups, etc
PhylotasticSpec - for developing detailed specifications (includes scoping statements)

attention new participants!

This is your wiki to write collaboratively with other authors.

Of course you should be polite about that-- but you shouldn't be shy. If everyone is shy about adding or editing or deleting or rearranging ("re-factoring") content, then the wiki becomes a mess, or it becomes the work of 1 non-shy person, which isn't what we want.

So, if you can improve this page by adding, removing, rewriting, or rearranging text, please do it. Edit this wiki like you own it. You don't have to ask permission-- you have it already. If you think the wiki would benefit from comments about topic X, don't add one of those annoying little notes that says

(Bob: shouldn't we have some comments about topic X?)

Instead, just add the comments about X, or create a new section for comments about X to be added later.

resources: software, references, tutorials, and other useful links

about the Phylotastic project
- Phylotastic slide presentation in ppt or PDF format
- PhylotasticiEvoBio presentation for iEvoBio 2012
pruning and grafting
- Phylomatic web home
- Phylomatic: tree assembly for applied phylogenetics (PDF) by Webb & Donoghue, 2005
- Rutger's proof-of-concept uses map-reduce
- Phylogenetic Diversity within Seconds shows that pruning-grafting with 10^5 leaf nodes can be done in seconds.
Taxonomic Name Resolution
- iPlants TNRS
- taxize, an R package that interfaces with phylomatic, TNRS, ITIS, uBio, EoL
ideas and resources for species-wise mashups
- Rod Page's http://ispecies.org creates an on-the-fly web page for a species based on info from NCBI, google scholar, etc
standards for representing data
- NeXML: rich, extensible, and verifiable representation of comparative data and metadata describes an extensible XML format for comparative data
adaptable viz environments
adaptable workflow environments

before, during and after the hackathon

before the hackathon

use the wiki to participate in planning
add your info & pic to the participant table

Hackathon agenda and guiding principles

agenda

day 1: informational presentations
day 1: design discussions, spec-ing
days 2 to 5: work
day 5: wrap up

guiding principles

create a demo implementation of a system based on open standards
allow alternative implementations, at least for some steps
allow flexibiilty for multiple use-cases
have at least one graphical front end that will make the promise of this project clear to users

after the hackathon

Opportunities right after the hackathon to build on the phylotastic momentum

do a challenge project for Geneious, present it at iEvoBio
do the iEvoBio challenge at iEvoBio
work on Galaxy integration at a workshop
- ISMB in Long Beach, July 13: Bioinformatics Software Interoperability (BIS SIG) - approaches to interoperability, including Cytoscape, Galaxy, GenePattern, GenomeSpace, and others, including the opportunity to adapt tools to one of these environments in a hackathon session.(http://www.broadinstitute.org/bsi-sig/)
- go to Chicago, July 25 to 27 for the 2012 Galaxy Community Conference (GCC2012, http://galaxyproject.org/GCC2012).

Manuscript

Phylotastic design

draftiness

This is a draft plan and a place to develop ideas. The overall target of the hackathon is fixed (build phylotastic), but no single aspect of the plan has been fixed. Participants are encouraged to develop plans here in April and May, before the hackathon starts. We will have the opportunity to re-think things on day 1 of the hackathon.

goal statement

Statement of goals. 1. Build phylotastic, a collection of interoperable web services that collectively provide the means to extract a subtree (specified by tips) from any of several large species tree, and to supply branch lengths and provenance annotation. 2. For demonstration purposes, leverage these services within a graphical interface that also integrates the resulting species tree with the user's choice of several high-value types of data. Optionally, this may involve adapting an existing environment (e.g., Galaxy, Taverna) to manage a phylotastic workflow.

inputs and outputs in brief

inputs = {

the user's list of species { S }; # the main input under the control of user
optionally, the user's character data, one row for each species in { S } ;
repository of megatrees that we have built for the project ;
any information on { S } conveniently available online via web services (e.g., NCBI, gbif)

outputs = {

phylogeny (with branch lengths) including only species in { S }; # main output
optionally, user's comparative data with tree (NEXUS or NeXML), ready for phylogenetic character analysis;
optionally, a mash-up with other information on { S } from online resources

}

where this output is presented graphically in some viewer that is relatively adaptable, e.g., Mesquite.

a bit more about the issue of integration and mashups

The main work of this project is to develop the "engine", the stuff that is "under the hood". However, the reason we are doing this is so that it is easier for users all over the world, in all different areas of science, to put their data in a phylogenetic context.

To illustrate the significance of this, we would like to devote a substantial fraction of the energy at the hackathon to creating integration tools that combine the engine of phylotastic, with species information that is easily gathered via existing services. Here are some of the kinds of useful information that we can collect for a species using online services:

images of an individual of the species, collected from EoL or wikipedia; or silhouettes from phylopic
geographic distribution of the species, from GBIF
the location of the nearest museum specimen of the species
whether a genome is available for this species, from NCBI
the number of protein sequences known for this species, from NCBI
the rDNA or cytochrome C sequence for this species, if available from NCBI
the average body size of the species
a list of publications that refer to this species from NCBI

The simplest interface to imagine, perhaps, is just a web form with a place to submit and validate a species list, and a set of check-boxes for which types of information to collect for those species. The user enters the species list, clicks on the desire information, and then clicks "Go", and the software goes and gets the information and the phylogeny, and presents it to the user for visualization (e.g., in Mesquite or some other viewer that can be adapted). For an example species mashup, see Rod Page's ispecies, which creates an on-the-fly web page for a species based on info from NCBI, google scholar, etc

Background

A problem faced in many areas of life sciences research, from community ecology to comparative genomics to biomedical genetics, is given some comparative data for a set of species, put these data into a phylogenetic context. For all we know, scientists are facing this type of problem hundreds of times every day. Given that past decade of efforts to assemble a large "tree of life", it ought to be possible to solve this problem in many cases by taking an existing megatree of species, pruning away unneeded parts, and grafting on (where possible) missing species.

An existing tool called "phylomatic" does precisely this: starting with a user-supplied list of species and a huge phylogenetic topology for plant genera, it grafts the species onto the tree wherever it can match the genus name, and it prunes away all the rest. This is just a topology, so users find ways to add branch lengths to the resulting tree. The result is that the user, so long as she is only interested in plants, can get a phylogeny for an arbitrary list of named species. Phylomatic rocks: its frequent use shows that big species trees are highly useful for applications in ecology, biodiversity, & trait analysis,when the interfaces that serve user needs— and the mega tree providing vast coverage— are available.

This suggests that if a more general tool can be built, it will be extraordinarily useful, especially if

it is an open standard that can be implemented in many ways
the back-end data store is populated with large phylogenies available for fungi, fish, mammals, butterflies, etc (not just plants)
the core functionality (name-matching, grafting & pruning) is modularized in open-source bioinfo toolboxes
methods for adding branch lengths are easier and more generalized
all of the above operations are wrapped up as web services that can be invoked from existing computing environments

If this were a web service, we could plug it into Mesquite, and users could load up their species-based character matrix, then get a tree for it. In fact, lets go back a step, to consider users with only a list of species, and no data to compare: consider an even more open-ended discovery environment, which we could implement in Galaxy or Taverna (given that this is all based on web services). The user starts with a list of species (or a higher taxon), and a request for some useful types of data that could be obtained by querying various available sources, e.g., whether it has a cyt oxidase sequence in GenBank, whether it is found in California, where is the nearest specimen, etc.

Architecture

Error creating thumbnail: Unable to save thumbnail to destination

Phylotastic: Difference between revisions

Revision as of 20:52, 8 April 2012

Contents

Where to go

attention new participants!

resources: software, references, tutorials, and other useful links

before, during and after the hackathon

before the hackathon

Hackathon agenda and guiding principles

after the hackathon

Phylotastic design

draftiness

goal statement

inputs and outputs in brief

a bit more about the issue of integration and mashups

Background

Architecture

Navigation menu

Phylotastic: Difference between revisions

Revision as of 20:52, 8 April 2012

Where to go

attention new participants!

resources: software, references, tutorials, and other useful links

before, during and after the hackathon

before the hackathon

Hackathon agenda and guiding principles

after the hackathon

Phylotastic design

draftiness

goal statement

inputs and outputs in brief

a bit more about the issue of integration and mashups

Background

Architecture

Navigation menu

Search