Reuse Barriers

From Evolutionary Interoperability and Outreach
Revision as of 22:45, 17 March 2011 by Hilmar (talk | contribs) (→‎Incorrect support of exchange standards)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This page is a developing list of practical barriers to reuse of evolutionary data, in particular phylogenetic data and trees. Since digital access to trees (and data) is normally the starting point for reuse, barriers to the digital availability, findability, and free access to trees, such as barriers to deposition in a digital archive, are deliberately included in the scope here.

Incorrect or lack of support of exchange standards

The most widely used exchange standard for phylogenetic data is NEXUS. NEXUS files are difficult to validate for compliance with the standard, and some programs either don't support the standard, or their support is incorrect. Digital archives cannot support all possible exchange formats and the variations of their implementations. For example, TreeBASE has had to resort to supporting only NEXUS files produced by the Mesquite tool. This makes digitally archiving and sharing of trees for reuse burdensome and error-prone for users of all other programs.

Examples:

  • WinClada sometimes produces faulty NEXUS files. For example, WinClada's method of indexing character labels and states is wrong.
    • Symptom: "I keep encountering an error when I try to upload the file to TreeBASE. The file has been created by WinClada."
    • Recommended workaround: Get Mesquite to open the file -- check that everything looks right -- and then have Mesquite save the file. Submit to TreeBASE.
  • MEGA does not support NEXUS
    • Recommended work-around: MEGA allows exporting to Newick, however unfortunately this excludes branch lengths. A program like PAUP can recalculate the branch lengths. Follow these steps:
      1. Export the tree to Newick. e.g., let's say that you get the following: "((Apple,Pear),Banana)"
      2. Edit the Newick using a text editor so that you create a TREE BLOCK:
        BEGIN TREES;
        TREE Fig_3 = ((Apple,Pear),Banana);
        END;

        (note the use of semi-colons).
      3. Copy/paste this TREE BLOCK at the end of your NEXUS file.
      4. For re-calculating branch lengths, follow the steps below. Otherwise, open the NEXUS file in Mesquite and check that there are no errors. Re-root your tree as needed. Edit taxon labels as needed (scientific names must be written out in full). Save the file. This can be uploaded to TreeBASE.
      5. To re-calculate branch lengths:
        1. Copy/paste this PAUP BLOCK to the end of the NEXUS file:
          BEGIN PAUP;
          set criterion=likelihood;
          lset nst=2 basefreq=empirical variant=hky rates=gamma ncat=4 shape=estimate pinvar=estimate;
          lscores 1 / sitelikes;
          savetrees file=foo.tre brlens;
          END;
        2. Edit the lset command to match your substitution model.
        3. Execute the NEXUS file in PAUP. This should generate a new tree file called foo.tre, containing branch lengths.
        4. Remove the old tree block and the paup block from your NEXUS.
        5. Open the NEXUS in Mesquite, and do "Include File..." from the File menu to import your foo.tre file.
        6. Save the file.