Tree Annotation

From Evolutionary Interoperability and Outreach
Jump to navigation Jump to search

Synopsis Annotate a small set of large trees used as sources of phylogenetic knowledge in an automated delivery system for tree-o-life knowledge called "Phylotastic".

Quick links

  • AnnotatedPhylotasticSourceTrees - report on the set of source trees, focusing on the types of metadata available, and how they might be used in phylotastic systems
  • TreestoreMetadataQueryDemonstration - report on the model of semantic encoding, the technology for translation, the treestore technology, and the implications of this for supporting phylotastic querying
  • AdvancingMIAPA report page - report on the adequacy of the MIAPA checklist, recommendations for revisions, ontology development, challenges of semantic encoding, and also (redundant to above report) the model of semantic encoding.

Overview

Metadata annotations represent an essential part of the design of phylotastic systems, enabling users to find trees based on sources and methods, and to generate a credible report of provenance for phylotastically generated trees. Yet, metadata play no role in current phylotastic component implementations. The TreeAnnotation team of hackathon 2 (Enrico, Hilmar, Joachim, Arlin, Ramona and 0.5 of Andrea) set out to address this deficiency. We developed an approach with 3 inter-connected goals:

  • create a set of 10 usefully annotated source trees
  • demonstrate metadata-based querying in a treestore
  • leverage this exercise to advance the MIAPA project

Our approach consisted of the following steps

  1. identify 10 useful source trees with available publications
  2. generate free-text annotations
  3. encode citations and annotations in computable form
  4. load the citation, annotations, and trees into a treestore
  5. demonstrate querying based on metadata

In particular, we chose to gather metadata corresponding to the MIAPA draft checklist, to enode it as RDF using a new ontology that imports several other ontologies, and to load the results into Ben Morris's Virtuoso-based treestore implementation.

During the hackathon, group members spent their time developing and revising a strategy, interpreting source materials, developing language support, encoding annotations, implementing tools, and addressing emerging challenges.

The tangible outcomes of the group relate to phylotastic source trees (a set of trees with metadata); software tools for processing, storage and querying; an ontology to support MIAPA annotations, along with a revised MIAPA checklist and form; and written reports on these 3 types of outputs, available on this wiki.


Detailed approach

  • develop plan (day 1)
    • revise as needed
    • some work is done in parallel
  • main workflow
  1. identify 10 trees for use as phylotastic source trees
  2. annotate them in free-text form
    • create web form in Google docs for input of annotations, based on MIAPA draft checklist from TDWG 2011 workshop
    • Spread sheet has pull down menus, plus options for free text entries under "other"
  3. transform annotations into a formal language statements in RDF
    • encoding process is iterative with ontology editing
    • Hilmar is working on language support
    • Joachim is working on the technology for getting this into a triplestore
    • Get URI for tree from TreeStore, add annotations to that URI in Protege
  4. Load trees into TreeStore
    • Will need to have trees in the correct format
  5. execute queries to demonstrate success

Log and accomplishments

  • initial plan (day 1)
  • initial MIAPA checklist-based input form (day 1)
  • revised input form
  • plan for (temporarily) storing trees and matrices (data) separate from metadata
  • annotations of 10 trees
  • translation technology
    • NEXUS issues, dendropy,
    • protege deals poorly with unnamed individuals
  • ontology for annotation

citation exercise

goal: annotate trees with citation data, encode, import into treestore, demonstrate querying based on citation metadata


notes on encoding

  • after some discussion, we decided to use BIBO (not dc or prism alone)
  • we failed to find any pre-existing method to auto-convert EndNote (or BibTex or Zotero) into BIBO
  • so we started hand-encoding them using Protege instances
    • authors
    • articles
      • used Data property "short title" instead of object property title
      • used date of issue for publication year
    • author-lists (RDF:list?)
  • ultimately we ended up getting the encoded citations via PubMed--> EndNote --> bibtex export --> Zotero --> bibo export (bibliontology RDF).

more annotations

miapa ontology

  • topology
    • gene tree vs species tree: Network:Tree:'Gene tree' or SpeciesTree
    • rooted: Network:Tree:RootedTree or UnrootedTree
    • 'Consensus tree'
  • otus
    • toTaxon, object property, points to taxon concept, can be URI from NCBI or other authority
    • derived_from specimen
    • location imported from geo
  • branch properties
    • branch lengths:
      • data property edge length
      • object property has_Annotation edge_length
    • branch support: data property has support value either bootstrap or posterior prob
  • character matrix
  • alignment method
    • name of software, version
    • parameters
    • manual correction
  • tree inference method
    • name of software, version: tree wasGeneratedBy (activity=) software procedure; software procedure wasAssociatedWith instance of software agent named "RaXML"
    • parameters: (activity) used instance of a parameter specification (which is a kind of plan)
    • character weights

semantic links for tree, citation, methods, etc

  • how rooted tree connects together
:tree1 has_root node0 ;
  • how unrooted tree connects together, using the belongs_to_tree relation
 :node9> obo:CDAO_0000200 :tree1 ; 
  • and the same for all the other nodes and edges.
  • how tree connects with citation (assume that pub1 is the root of the <bibo:AcademicArticle> individual )
 :tree1 dcterms:isReferencedBy :pub1 ;
  • some other ideas
    • :pub1 IAO:is_about :tree1
    • :pub1 documents :tree1
    • cito:provides_methods_for :tree1
    • :pub1 cito:provides_data_for :tree1
  • how tree connects with methods annotation
:tree1 prov:wasGeneratedBy :tree_activity1 ;
  • how char matrix connects with methods annotation
:align1 prov:wasGeneratedBy :align_activity1 ;
  • how tree connects with char matrix
:tree1 prov:wasDerivedFrom :align1 ;

Annotation Workflow

Example file: Phylomatictree.nex

1. `python treestore.py add Phylomatictree.nex nexus phylomatictree`

  • reads NEXUS file `Phylomatictree.nex`
  • stores the tree in the named graph `phylomatictree`

2. `rdfcat -out N-TRIPLE annotations.rdf > annotations.ntriples`

  • takes annotations (saved with Protege as RDF/XML, Turtle, or other format)
  • outputs N-Triples

3. `python treestore.py add annotations.ntriples ntriples phylomatictree`

  • adds the annotations to the named graph `phylomatictree`

Example file: Tree_2_Peters_et_al.newick

1. `python treestore.py add Tree_2_Peters_et_al.newick newick peters2`

  • reads Newick file `Tree_2_Peters_et_al.newick`
  • stores the tree in the named graph `peters2`

2. `rdfcat -out N-TRIPLE annotations.rdf > annotations.ntriples`

  • takes annotations (saved with Protege as RDF/XML, Turtle, or other format)
  • outputs N-Triples

3. `python treestore.py add annotations.ntriples ntriples peters2`

  • adds the annotations to the named graph `peters2`