MIAPA/PhyloWays
PhyloWays: a list of interpreted phyloinformatics workflows
This is intended to house a reference set of pairs, where each pair consists of
- a publication reporting a phylogeny
- a more precise or formal description of the methods
This reference set is developed in the hope it will be useful for various projects:
- developing the vocabulary support to annotate phylogenetics workflows
- developing an annotation tool to create phylogenetic records that satisfy a MIAPA-like standard
- testing natural-language-processing (NLP) tools to extract methods information from published papers
- creating an archive where users share, like, comment, and link to workflow descriptions
overview
guidelines
cases
Angiosperm phylogeny (Soltis, et al, 2011)
Publication Soltis DE, Smith SA, Cellinese N, Wurdack KJ, Tank DC, Brockington SF, Refulio-Rodriguez NF, Walker JB, Moore MJ, Carlsward BS, et al. 2011. Angiosperm phylogeny: 17 genes, 640 taxa. Am J Bot 2011:ajb.1000404. - http://www.amjbot.org/cgi/reprint/ajb.1000404v1
Data: concatenated alignments for a superset of 14loci/17 genes (nucleotide sequences) sampled from 640 species. Genes included 18S rDNA (nuc), 26S rDNA (nuc), atpB (cp), atp1 (mito), matK (cp), matR (mito), nad5 (mito), ndhF (cp), psbBTNH (cp 4 gene region), rbcL (cp), rpoC2 (cp), rps16 (cp), rps3 (mito), and rps4 (cp).
Alignment method: MAFFT used to align each of 14 loci; "adjustments were made by eye when there were obvious alignment errors due to particularly divergent or “ gappy ” sequences"; Sites (columns) with > 50% missing data (including gaps due to indels) were removed using Phyutility (Smith and Dunn, 2008). All or subsets of gene alignments concatenated for phylogenetic analysis.
Tree estimation: Independent MP and ML analyses performed the following data matrices; nuclear rDNA genes; cp genes; mito genes; nuclear+cp genes; all 17 genes.
- Method (1) - ML; 10 independent runs for each data matrix.
- Program - RAxML (vers. 7.1; Stamatakis, 2006 ).
- Model of sequence evolution - GTRGAMMA with parameters estimated separately (unlinked) for each gene partition.
- Method for evaluating support - 100-300 bootstrap replicates
- Method (2) - MP parsimony ratchet with 50 independent replicates, each run for 500 iterations each; MP tree estimates as majority rule of best trees from each replicate; tree only shown for 17 gene supermatrix.
- Program - SeqBoot (Phylip; Felsenstein, 2005), PAUPRat ( Sikes and Lewis, 2001 ) and PAUP* 4.0b10 ( Swofford, 2002 ).
- Method for evaluating support - bootstrap - 500 bootstrap datasets generated using SeqBoot. A PAUPRat-generated ratchet file generated for each pseudoreplicate and run for a single 500-iteration search.
Additional comments: Trees available in TreeBASE - http://www.treebase.org/treebase-web/search/study/analyses.html?id=11267 ; Polyosma mtDNA loci omitted from analysis as contaminant after assessing discordance with other loci; Cardiopteris atp1 suspected as a contaminant, but retained.
semi-formalized description
The main descriptive statement
publication Pub1 reports PhylogenyResult1.1 and PhylogenyResult2.1
About the publication
Pub1 has_authors "Soltis DE", "Smith SA", "Cellinese N" . . . Pub1 has_citation "Am J Bot 2011:ajb.1000404" . . . Pub1 has_URL . . .
About phylogeny result 1.1, which is a consensus tree? the value is either concrete (a newick tree) or a pointer (to a treebase accession or a nexml object)
PhylogenyResult1.1 has_value . . . < concrete or referenced_by pointer > PhylogenyResult1.1 has_input PhylogenyResult1.0 PhylogenyResult1.1 has_method MajorityRuleConsensus # ?? not sure PhylogenyResult1.1 has_method_details "100 to 300 bootstrap replicates" PhylogenyResult1.0 has_value NA # we are not showing all the bootstrap trees PhylogenyResult1.0 has_input Alignment1.1 PhylogenyResult1.0 has_method Method1
About phylogeny result 2.1, which is a consensus tree
PhylogenyResult2.1 has_value . . . < concrete or referenced_by pointer > PhylogenyResult2.1 has_input PhylogenyResult2.0 PhylogenyResult2.1 has_method MajorityRuleConsensus PhylogenyResult2.1 has_method_details "not sure about this" PhylogenyResult2.0 has_value NA # we are not showing all the bootstrap trees PhylogenyResult2.0 has_input Alignment1.1 PhylogenyResult2.0 has_method Method2
ALIGNMENTS About Alignment1.1, which is an edit from Alignment 1.0, which is a concatenation
Alignment1.1 has_value . . . < concrete or referenced_by pointer > Alignment1.1 has_input Alignment1.0 Alignment1.1 has_method Pruning Alignment1.1 has_method_details "delete sites with >50% missing data" Alignment1.0 has_value . . . < concrete or referenced_by pointer > Alignment1.0 has_input Alignment2.1, Alignment3.1 . . . Alignment15.1 Alignment1.0 has_method Concatenate
About Alignment2.1, a component alignment edited from a MAFFT alignment
Alignment2.1 has_value . . . < concrete or referenced_by pointer > Alignment2.1 has_input Alignment2.0 Alignment2.1 has_method EditByHand Alignment2.1 has_method_details "remove divergent or gappy sequences" Alignment2.0 has_value . . . < concrete or referenced_by pointer > Alignment2.0 has_input . . . < list of GenBank accessions, ideally > Alignment2.0 has_method MAFFT Alignment0.0 has_method_details NA
Alignments 3.1 to 15.1 are similar-- each one is a possibly edited version of a MAFFT alignment for an individual set of sequences.
PHYLOGENY METHODS
Method1 has_attributes * software RAxML * software_version 7.1 * objective_function maximum_likelihood * sitewise_model SiteWiseModel1 * among_site_model AmongSiteModel1 SiteWiseModel1 has_attributes * GTR AmongSiteModel has_attributes * gamma * partitions Method2 has_attributes * software PAUPRat * software PAUP * software_version 4.0b10 * objective_function maximum_parsimony * search_method parsimony_ratchet