<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://www.evoio.org/w/index.php?action=history&amp;feed=atom&amp;title=Phylotastic%2FDatastore</id>
	<title>Phylotastic/Datastore - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://www.evoio.org/w/index.php?action=history&amp;feed=atom&amp;title=Phylotastic%2FDatastore"/>
	<link rel="alternate" type="text/html" href="https://www.evoio.org/w/index.php?title=Phylotastic/Datastore&amp;action=history"/>
	<updated>2026-05-16T21:34:34Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.41.1</generator>
	<entry>
		<id>https://www.evoio.org/w/index.php?title=Phylotastic/Datastore&amp;diff=3097&amp;oldid=prev</id>
		<title>Hilmar at 17:44, 10 June 2012</title>
		<link rel="alternate" type="text/html" href="https://www.evoio.org/w/index.php?title=Phylotastic/Datastore&amp;diff=3097&amp;oldid=prev"/>
		<updated>2012-06-10T17:44:35Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 17:44, 10 June 2012&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l128&quot;&gt;Line 128:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 128:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* [http://gigaom.com/2010/08/01/meet-big-data-equivalent-of-the-lamp-stack/ Meet the Big Data Equivalent of the LAMP Stack]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;* [http://gigaom.com/2010/08/01/meet-big-data-equivalent-of-the-lamp-stack/ Meet the Big Data Equivalent of the LAMP Stack]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;[[Category:Phylotastic]]&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Hilmar</name></author>
	</entry>
	<entry>
		<id>https://www.evoio.org/w/index.php?title=Phylotastic/Datastore&amp;diff=3096&amp;oldid=prev</id>
		<title>Hilmar: /* Datastore subgroup */</title>
		<link rel="alternate" type="text/html" href="https://www.evoio.org/w/index.php?title=Phylotastic/Datastore&amp;diff=3096&amp;oldid=prev"/>
		<updated>2012-06-07T17:39:13Z</updated>

		<summary type="html">&lt;p&gt;&lt;span dir=&quot;auto&quot;&gt;&lt;span class=&quot;autocomment&quot;&gt;Datastore subgroup&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 17:39, 7 June 2012&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l2&quot;&gt;Line 2:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 2:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The datastore subgroup is keeping notes on a [https://docs.google.com/document/d/1zj601OUQWqh5I-5weo9v3qjlzhNWeh1hunTYGBKfeLA/edit google doc]. The rest of this page is pre-hackathon notes about potential approaches.&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The datastore subgroup is keeping notes on a [https://docs.google.com/document/d/1zj601OUQWqh5I-5weo9v3qjlzhNWeh1hunTYGBKfeLA/edit google doc]. The rest of this page is pre-hackathon notes about potential approaches.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Image:rdf_model.png|&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;300px&lt;/del&gt;]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Image:rdf_model.png|&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;800px&lt;/ins&gt;]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;br&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;= About =&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;= About =&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Hilmar</name></author>
	</entry>
	<entry>
		<id>https://www.evoio.org/w/index.php?title=Phylotastic/Datastore&amp;diff=3095&amp;oldid=prev</id>
		<title>Hilmar: moved PhylotasticDatastore to Phylotastic/Datastore</title>
		<link rel="alternate" type="text/html" href="https://www.evoio.org/w/index.php?title=Phylotastic/Datastore&amp;diff=3095&amp;oldid=prev"/>
		<updated>2012-06-07T17:38:42Z</updated>

		<summary type="html">&lt;p&gt;moved &lt;a href=&quot;/wiki/PhylotasticDatastore&quot; class=&quot;mw-redirect&quot; title=&quot;PhylotasticDatastore&quot;&gt;PhylotasticDatastore&lt;/a&gt; to &lt;a href=&quot;/wiki/Phylotastic/Datastore&quot; title=&quot;Phylotastic/Datastore&quot;&gt;Phylotastic/Datastore&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;=Datastore subgroup=&lt;br /&gt;
The datastore subgroup is keeping notes on a [https://docs.google.com/document/d/1zj601OUQWqh5I-5weo9v3qjlzhNWeh1hunTYGBKfeLA/edit google doc]. The rest of this page is pre-hackathon notes about potential approaches.&lt;br /&gt;
&lt;br /&gt;
[[Image:rdf_model.png|300px]]&lt;br /&gt;
&lt;br /&gt;
= About =&lt;br /&gt;
&lt;br /&gt;
Information on various options that could be used as datastores to host the trees underlying the phylotastic service that prunes trees.&lt;br /&gt;
&lt;br /&gt;
= Requirements = &lt;br /&gt;
&lt;br /&gt;
A place to document requirements for a phylotastic data store.&lt;br /&gt;
&lt;br /&gt;
=== Ontology ===&lt;br /&gt;
&lt;br /&gt;
Should the datastore include ontology support? This would allow for flexible markup of node attributes, as well as flexible attribution of source trees.&lt;br /&gt;
&lt;br /&gt;
=== Nodes and edges ===&lt;br /&gt;
&lt;br /&gt;
Will the use of the data store require nodes and edges store in separate tables (if SQL used)?&lt;br /&gt;
&lt;br /&gt;
= Datastores =&lt;br /&gt;
&lt;br /&gt;
== RDMBS  ==&lt;br /&gt;
&lt;br /&gt;
=== Options for trees/hierarchies in RDMBS === &lt;br /&gt;
&lt;br /&gt;
This documentation copied from [http://stackoverflow.com/questions/4048151/what-are-the-options-for-storing-hierarchical-data-in-a-relational-database  options for storing hierarchical data in a relational database]&lt;br /&gt;
&lt;br /&gt;
* [http://en.wikipedia.org/wiki/Adjacency_list Adjacency List]:&lt;br /&gt;
** Columns: ID, ParentID&lt;br /&gt;
** Easy to implement.&lt;br /&gt;
** Cheap node moves, inserts, and deletes.&lt;br /&gt;
** Expensive to find level (can store as a computed column), ancestry &amp;amp; descendants (Bridge Hierarchy combined with level column can solve), path (Lineage Column can solve).&lt;br /&gt;
** Use Common Table Expressions in those databases that support them to traverse.&lt;br /&gt;
* [http://en.wikipedia.org/wiki/Nested_set_model Nested Set] (a.k.a Modified Preorder Tree Traversal)&lt;br /&gt;
**First described by Joe Celko - covered in depth in his book Trees and Hierarchies in SQL for Smarties&lt;br /&gt;
** Columns: Left, Right&lt;br /&gt;
** Cheap level, ancestry, descendants&lt;br /&gt;
** Compared to Adjacency List, moves, inserts, deletes more expensive.&lt;br /&gt;
** Requires a specific sort order (e.g. created). So sorting all descendants in a different order requires additional work.&lt;br /&gt;
* [https://communities.bmc.com/communities/docs/DOC-9902 Nested Intervals]&lt;br /&gt;
** Combination of Nested Sets and Materialized Path where left/right columns are floating point decimals instead of integers and encode the path information. In the later development of this idea nested intervals gave rise to [http://vadimtropashko.files.wordpress.com/2011/07/ch5.pdf matrix encoding].&lt;br /&gt;
* [http://www.informationweek.com/news/219400252 Bridge Table] (a.k.a. [http://dirtsimple.org/2010/11/simplest-way-to-do-tree-based-queries.html Closure Table]: some good ideas about how to use triggers for maintaining this approach)&lt;br /&gt;
** Columns: ancestor, descendant&lt;br /&gt;
** Stands apart from table it describes.&lt;br /&gt;
** Can include some nodes in more than one hierarchy.&lt;br /&gt;
** Cheap ancestry and descendants (albeit not in what order)&lt;br /&gt;
** For complete knowledge of a hierarchy needs to be combined with another option.&lt;br /&gt;
* [http://evolt.org/node/4047/ Flat Table]&lt;br /&gt;
** A modification of the Adjacency List that adds a Level and Rank (e.g. ordering) column to each record.&lt;br /&gt;
** Expensive move and delete&lt;br /&gt;
** Cheap ancestry and descendants&lt;br /&gt;
** Good Use: threaded discussion - forums / blog comments&lt;br /&gt;
* [http://www.ferdychristant.com/blog//articles/DOMM-7QJPM7 Lineage Column] (a.k.a. [https://communities.bmc.com/communities/docs/DOC-9902 Materialized Path], Path Enumeration)&lt;br /&gt;
** Column: lineage (e.g. /parent/child/grandchild/etc...)&lt;br /&gt;
** Limit to how deep the hierarchy can be.&lt;br /&gt;
** Descendants cheap (e.g. LEFT(lineage, #) = '/enumerated/path')&lt;br /&gt;
** Ancestry tricky (database specific queries)&lt;br /&gt;
&lt;br /&gt;
=== RDBMS ===&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;text-align: left&amp;quot;&lt;br /&gt;
! Feature&lt;br /&gt;
! MySQL&lt;br /&gt;
! PostgreSQL&lt;br /&gt;
|-&lt;br /&gt;
| Max allowed packet (a limit on the list of species for trimming the tree)&lt;br /&gt;
| By default this is 1 MB but can be increased&lt;br /&gt;
| &lt;br /&gt;
|-&lt;br /&gt;
| Forced Referential integreity&lt;br /&gt;
| Yes ([http://www.sitepoint.com/mysql-innodb-table-pros-cons/ with InnoDB tables] but these generally are slower than [http://www.kavoir.com/2009/09/mysql-engines-innodb-vs-myisam-a-comparison-of-pros-and-cons.html MyISAM tables])&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
| Stored procedures&lt;br /&gt;
| Yes (&amp;gt; [http://dev.mysql.com/doc/refman/5.0/en/stored-routines.html MySQL 5.0]) However does not accept arrays (ie List of leaf nodes) as variables in stored procedures.&lt;br /&gt;
| Yes&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Schema ===&lt;br /&gt;
&lt;br /&gt;
The following schema support storing trees.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;text-align: left&amp;quot;&lt;br /&gt;
! Schema&lt;br /&gt;
! Supported RDBMS&lt;br /&gt;
! Ontology Support&lt;br /&gt;
|-&lt;br /&gt;
|[http://www.biosql.org/wiki/Extensions BioSQL::Phylo]&lt;br /&gt;
| PostgreSQL&lt;br /&gt;
| Yes&lt;br /&gt;
|-&lt;br /&gt;
|[http://www.ensembl.org/info/docs/api/compara/compara_schema.html EnsembleCompara]&lt;br /&gt;
|PostgreSQL, MySQL&lt;br /&gt;
|No, but it is possible with [https://github.com/jestill/iplant-treerec/blob/master/schema/tr_schema_mysql.sql schema extensions] that add [http://gmod.org/wiki/Chado_CV_Module Chado controlled vocabulary tables].&lt;br /&gt;
|-&lt;br /&gt;
|[http://gmod.org/wiki/Chado_Phylogeny_Module Chado::Phylogeny ]&lt;br /&gt;
|PostgreSQL, MySQL&lt;br /&gt;
|Yes (OBO, OWL with conversion to OBO)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
=== Additional Information ===&lt;br /&gt;
&lt;br /&gt;
* [http://vadimtropashko.wordpress.com/2008/08/09/one-more-nested-intervals-vs-adjacency-list-comparison/ Good overview table of adjacency lists vs. Nested intervals]&lt;br /&gt;
* [http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/ Managing Hierarchical Data in MySQL]&lt;br /&gt;
* [http://www.sitepoint.com/hierarchical-data-database/ Storing Hierarchical Data in a Database]&lt;br /&gt;
* [http://www.slideshare.net/billkarwin/models-for-hierarchical-data Slideshare .. Models for Hierarchical data]&lt;br /&gt;
* [http://troels.arvin.dk/db/rdbms/links/#hierarchical More links on hierarchical data in SQL]&lt;br /&gt;
&lt;br /&gt;
== NOSQL ==&lt;br /&gt;
&lt;br /&gt;
Examples of [http://www.mongodb.org/display/DOCS/Trees+in+MongoDB storing trees in MongoDB].&lt;br /&gt;
&lt;br /&gt;
== Hadoop ==&lt;br /&gt;
&lt;br /&gt;
=== HBASE === &lt;br /&gt;
&lt;br /&gt;
HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable: A Distributed Storage System for Structured Data by Chang et al. Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop and HDFS.&lt;br /&gt;
&lt;br /&gt;
* [https://hbase.apache.org/ Apache HBase Home]&lt;br /&gt;
&lt;br /&gt;
=== Tolomatic ===&lt;br /&gt;
&lt;br /&gt;
[https://github.com/phylotastic/tolomatic Rutger's documentation on using MapReduce]&lt;br /&gt;
&lt;br /&gt;
=== Additional Information ===&lt;br /&gt;
&lt;br /&gt;
* [http://gigaom.com/2010/08/01/meet-big-data-equivalent-of-the-lamp-stack/ Meet the Big Data Equivalent of the LAMP Stack]&lt;/div&gt;</summary>
		<author><name>Hilmar</name></author>
	</entry>
</feed>