PhylotasticSpec
about
This is the page for developing specifications, i.e., clear and detailed statements about behavior expected under a set of conditions. Some of these may come from the scoping statements.
Note that we can split this into a less demanding spec (e.g., Phylotastic version 0.2) and a more demanding spec (version 1.0).
some questions
- what are the expected inputs and outputs in 80 or 90 % of use-cases or invocations?
- consider both types and amounts of information
- what options must be given to the user?
- what are the sensible defaults?
- what errors, warnings are essential?
- what metadata must be added at this step?
in regard to #2, #4 and #5, don't over-design-- remember that we can extend the interface later, but we don't want to subtract later.
Should phylogeny services return only topologies without branch lengths?
Should a client send names of higher taxa to a phylogeny service?
- Yes. Phylogeny services can ignore these if they don't know what to do with them. If the phylogeny service wants to invoke a taxonomy resolver-sampler on the inputs, it can do that.
- No.
a strawman sketch of APIs
- The Phylotastic CORE (core broker) API
- invokes phylotastic services (in combination) to produce scaled tree (list of OTUs --> phylogeny with OTUs)
- The Phylotastic Phylogeny Service (Operation) API
- provide a phylogeny with or without branch lengths for a named list of OTUs
- The Phylotastic Name Resolution Service (Operation) API
- translate input list of names into qualified names (list of names --> resolved list of names)
- The Phylotastic Annotation Service (Operation) API
- modify a phylogeny by adding annotations to branches or nodes
- The Phylotastic Scaling Service (Operation) API
- modify a phylogeny by adding branch lengths or node ages
- The Phylotastic TreeStore API
- tools to add, delete, update and annotate the back-end treestore of a phylogeny service
- The Phylotastic Taxonomy Sampler API
- for a taxon T, provide a sample S = { S1, S2 . . . } by user criterion
- format translation services API
- (format X --> format Y)
scoping statements
In Scope
- Populating data store of existing trees
- Evolution of PhyloWS to support the needs of Phylomatic
- Taxonomic name resolution (embedding existing TNRS capacities)
- Pruning trees and grafting species on them
- Branch length (existing methods for incorporating branch lengths)
- Integration of data and trees (e.g., mashups) - species-wise integration
- Display of resulting trees (using existing technologies)
- Wrap all these existing tools as web services
- NeXML syntax extensions if needed
- If needed, determine methods for compressing NeXML representations
- Simple user interface (web form)
Not In Scope
- Constructing new input trees
- New Data Generation
- Arguing or evaluating the correctness of trees
- Design of new TNRS systems
- Debates about which naming system is best
- Developing new techniques to derive branch lengths
Uncertain, depends on participant skills and perspectives
- Phylo-referencing
- MIAPA annotations of the steps; provenance annotations
Taxonomic name resolution
Critical questions and cases
- what specific operations are required for basic phylotastic functionality? more advanced functionality?
- do we need to support strict and non-strict resolutions?
- Homonymy: Many Linnaean binomials are re-used for both a plant and an animal taxon, as their codes don't preclude overlap [[1]]. How will we avoid dramatic mix-ups?
- Cross-code synonyms are in completely different kingdoms, so we could use resolve a taxon to the kingdom level and check if one kingdom ends up in a node with a bunch of others. That won't cause us any problems until we have trees going back to splits between kingdoms, and should still be identifiable by looking out for a group of Kingdom X inside Kingdom Y.
- We could also allow users to specify taxa using taxonomic identifiers (ITIS TSNs, uBIO LSIDs, etc.), although this won't sort out all our problems either (http://www.ubio.org/browser/details.php?namebankID=238253 is Crucibulum, a cross-code synonym which uBio matches to both the fungus and the gastropod). Users should always be able to clarify this by using the full taxonomic name under either code (in this case, "Crucibulum Schumacher, 1817" (gastropod, ICZN) vs "Crucibulum Tul. & C. Tul." (fungus, ICBN))
- Homonymy: Many Linnaean binomials are re-used for both a plant and an animal taxon, as their codes don't preclude overlap [[1]]. How will we avoid dramatic mix-ups?