MIAPA/PhyloWays: Difference between revisions

From Evolutionary Interoperability and Outreach
Jump to navigation Jump to search
No edit summary
(No difference)

Revision as of 14:08, 1 August 2011

PhyloWays: a list of interpreted phyloinformatics workflows

This is intended to house a reference set of pairs, where each pair consists of

  1. a publication reporting a phylogeny
  2. a more precise or formal description of the methods

This reference set is developed in the hope it will be useful for various projects:

  • developing the vocabulary support to annotate phylogenetics workflows
  • developing an annotation tool to create phylogenetic records that satisfy a MIAPA-like standard
  • testing natural-language-processing (NLP) tools to extract methods information from published papers
  • creating an archive where users share, like, comment, and link to workflow descriptions

overview

guidelines

cases

Angiosperm phylogeny (Soltis, et al, 2011)

Publication Soltis DE, Smith SA, Cellinese N, Wurdack KJ, Tank DC, Brockington SF, Refulio-Rodriguez NF, Walker JB, Moore MJ, Carlsward BS, et al. 2011. Angiosperm phylogeny: 17 genes, 640 taxa. Am J Bot 2011:ajb.1000404. - http://www.amjbot.org/cgi/reprint/ajb.1000404v1

Data: concatenated alignments for a superset of 14loci/17 genes (nucleotide sequences) sampled from 640 species. Genes included 18S rDNA (nuc), 26S rDNA (nuc), atpB (cp), atp1 (mito), matK (cp), matR (mito), nad5 (mito), ndhF (cp), psbBTNH (cp 4 gene region), rbcL (cp), rpoC2 (cp), rps16 (cp), rps3 (mito), and rps4 (cp).

Alignment method: MAFFT used to align each of 14 loci; "adjustments were made by eye when there were obvious alignment errors due to particularly divergent or “ gappy ” sequences"; Sites (columns) with > 50% missing data (including gaps due to indels) were removed using Phyutility (Smith and Dunn, 2008). All or subsets of gene alignments concatenated for phylogenetic analysis.

Tree estimation: Independent MP and ML analyses performed the following data matrices; nuclear rDNA genes; cp genes; mito genes; nuclear+cp genes; all 17 genes.

  1. Method (1) - ML; 10 independent runs for each data matrix.
    • Program - RAxML (vers. 7.1; Stamatakis, 2006 ).
    • Model of sequence evolution - GTRGAMMA with parameters estimated separately (unlinked) for each gene partition.
    • Method for evaluating support - 100-300 bootstrap replicates
  2. Method (2) - MP parsimony ratchet with 50 independent replicates, each run for 500 iterations each; MP tree estimates as majority rule of best trees from each replicate; tree only shown for 17 gene supermatrix.
    • Program - SeqBoot (Phylip; Felsenstein, 2005), PAUPRat ( Sikes and Lewis, 2001 ) and PAUP* 4.0b10 ( Swofford, 2002 ).
    • Method for evaluating support - bootstrap - 500 bootstrap datasets generated using SeqBoot. A PAUPRat-generated ratchet file generated for each pseudoreplicate and run for a single 500-iteration search.

Additional comments: Trees available in TreeBASE - http://www.treebase.org/treebase-web/search/study/analyses.html?id=11267 ; Polyosma mtDNA loci omitted from analysis as contaminant after assessing discordance with other loci; Cardiopteris atp1 suspected as a contaminant, but retained.

semi-formalized description

The main descriptive statement

publication Pub1 reports PhylogenyResult1.1 and PhylogenyResult2.1

About the publication

Pub1 has_authors "Soltis DE", "Smith SA", "Cellinese N" . . . 
Pub1 has_citation "Am J Bot 2011:ajb.1000404" . . .
Pub1 has_URL . . .

About phylogeny result 1.1, which is a consensus tree? the value is either concrete (a newick tree) or a pointer (to a treebase accession or a nexml object)

PhylogenyResult1.1 has_value . . .  < concrete or referenced_by pointer >
PhylogenyResult1.1 has_input PhylogenyResult1.0
PhylogenyResult1.1 has_method MajorityRuleConsensus # ?? not sure
PhylogenyResult1.1 has_method_details "100 to 300 bootstrap replicates"

PhylogenyResult1.0 has_value NA  # we are not showing all the bootstrap trees
PhylogenyResult1.0 has_input Alignment1.1
PhylogenyResult1.0 has_method Method1

About phylogeny result 2.1, which is a consensus tree

PhylogenyResult2.1 has_value . . .  < concrete or referenced_by pointer >
PhylogenyResult2.1 has_input PhylogenyResult2.0
PhylogenyResult2.1 has_method MajorityRuleConsensus 
PhylogenyResult2.1 has_method_details "not sure about this" 

PhylogenyResult2.0 has_value NA  # we are not showing all the bootstrap trees
PhylogenyResult2.0 has_input Alignment1.1
PhylogenyResult2.0 has_method Method2

ALIGNMENTS About Alignment1.1, which is an edit from Alignment 1.0, which is a concatenation

Alignment1.1 has_value . . . < concrete or referenced_by pointer >
Alignment1.1 has_input Alignment1.0
Alignment1.1 has_method Pruning
Alignment1.1 has_method_details "delete sites with >50% missing data" 

Alignment1.0 has_value . . . < concrete or referenced_by pointer >
Alignment1.0 has_input Alignment2.1, Alignment3.1 . . . Alignment15.1
Alignment1.0 has_method Concatenate

About Alignment2.1, a component alignment edited from a MAFFT alignment

Alignment2.1 has_value . . . < concrete or referenced_by pointer >
Alignment2.1 has_input Alignment2.0
Alignment2.1 has_method EditByHand 
Alignment2.1 has_method_details "remove divergent or gappy sequences" 

Alignment2.0 has_value . . . < concrete or referenced_by pointer >
Alignment2.0 has_input . . . < list of GenBank accessions, ideally > 
Alignment2.0 has_method MAFFT
Alignment0.0 has_method_details NA

Alignments 3.1 to 15.1 are similar-- each one is a possibly edited version of a MAFFT alignment for an individual set of sequences.

PHYLOGENY METHODS

Method1 has_attributes
* software RAxML
* software_version 7.1
* objective_function maximum_likelihood
* sitewise_model SiteWiseModel1
* among_site_model AmongSiteModel1

SiteWiseModel1 has_attributes
* GTR

AmongSiteModel has_attributes
* gamma
* partitions 

Method2 has_attributes
* software PAUPRat
* software PAUP
* software_version 4.0b10
* objective_function maximum_parsimony
* search_method parsimony_ratchet

another case