HIP LT Meetings: Difference between revisions
Line 403: | Line 403: | ||
'''Review of the ToL-o-matic Project''' | '''Review of the ToL-o-matic Project''' | ||
* Project statement: | * Project statement: | ||
<blockquote>Develop a collection of services to extract a subtree (specified by tips) from any large species tree, and to provide branch lengths, provenance annotations, and some other useful annotations.</blockquote> | |||
[[File:whiteboard1.jpg|600px|whiteboard]] | [[File:whiteboard1.jpg|600px|whiteboard]] |
Revision as of 03:08, 22 January 2012
Process
Links to writings on virtual meetings and virtual teams:
- 20 Simple Ways to Improve Virtual Meetings (white paper by Interaction Associates)
- Starting up a Virtual Team
- Running an Effective Teleconference or Virtual Meeting
- Leading Virtual Teams
- How to manage virtual teams (wikibooks)
- How to Manage Virtual Teams (MIT Sloan Management Review)
Face to face meetings
Inaugural F2F Meeting
HIP LT meeting, NESCent, Jan 19-21
Day 1 presentations
Introductions and Overview
- 9:00 NESCent introductions
Karen Cranston: Current trajectories of big projects like TreeBASE, iPlant, and ToLWeb
- iPlant Collaborative
- Discovery Environment for phylogenetic and genomics workflows, easy incorporation of external tools
- taxonomic name reconciliation, TNRS: open source on github
- large tree viewer: open source on github
- Phyloinformatics Research Foundation and TreeBASE, ToLWeb
- NSF ABI proposal: declined Jan 2012
- OpenPhylo (recommended for funding Jan 2012 through AVAToL) Project Summary
- EOL
- allows for switching between taxonomies
- upcoming contest to provide phylogenetic organizing structure as a Darwin Core Archive
- input API proposed to allow for pushing content to EOL
Other resources that came up in the discussion:
- Phylomatic
- CiPRES: HPC resources for phylogenetic analyses
- VertNet
- Map of Life
- citizen science sites: iNaturalist, eBird
- TimeTree: searchable database of divergence times
Mark Wilkinson: Hackathons: what are they, what they have and can accomplish, what works and doesn't, good practices
Enrico Pontelli: The EvoIO stack technology and its current status
Day 1 discussions
some thoughts to guide discussion
goals and motivators -- which are most important?
- improved interoperability ("building links in an emerging network of interoperable phylogenetic resources")
- world domination, impact (e.g., penetration of "stack" technologies)
- providing growth experiences to participants (skills, networking, successes)
- exemplifying good practices in scientific data-sharing
Hackathon conceptions differ along the following dimensions, relative to what has been traditional at NESCent:
dimension | traditional | alternative |
---|---|---|
Edginess | encourage edgy, creative projects that result in demo software | encourage incremental practical fixes to production code |
Spontaneity | organizers only choose participants; participants choose projects usingopen-space principles | organizers coordinate the hackathon toward a single over-arching goal |
Impact horizon | focus on long-term community-wide impact of promoting best practices in interop | focus on short-terms gains for users and stakeholders |
group discussion
- Need to define the topics of hackathons, in terms of the order and big picture.
- Need also to meet the expectations of the funding entities.
Reviewing the proposal. Start from the original vision.
First concern: if we look at the first three resources mentioned in Hackathon 1, there is going to be a very different level of support and difficulty in integrating
- Treebase is almost ready
- Dryad is a problem because of the diversity of data
- TolWeb might not be too hard to develop; not there already but easy
EOL has not phylogenetic content – but with a to/from API it could be done. This could be Hackathon 2 – integration of trees into EOL. EOL to decorate trees.
Hackathon 1: issue is integrating API and data formats. We could use NeXML as the format. The hackathon could also indicate what is missing in the standards.
Question: lots of biodiversity applications, metagenomics integration; big application domain.
Thinking how the tools to be built can handle these data. Taxonomy is a resource that people use – can they be interested in this?
Hackathon 1 is also a bit vague – the problem is that we do not talk about what we want to do with data. Vague. Maybe Hackathon 1 should be more focused, and then hackathon 2 not on a single project but generalize Hackathon 1 to other data sources.
Question: Is the dealing with the data resources the main problem or is the problem something else? People don’t use TreeBase extensively so what is the buy-in from the community? For example, if we look at the the enumeration of resources – many data resources are unfunded, they may disappear, they have little use, closed source, isolated. So the impact could be limited. There is risk. May fail. Other worry – what’s the use case? What are people trying to solve?
Maybe we should validate NeXML implementations instead. Ensure that all the implementations are correct and interoperable. Identify packages that don’t use NeXML and make it easier to use (e.g., converters).
The developer of one of those pieces of software is in the room and we create the shim services.
If we stick to the current structure of the proposal we get nice examples but perhaps no definite production-level code. Hackathon 3 is really the only one that directly produce something for the user – edgy environment for the users. E.g., EOL has no phylogenetics. EOL wants an environment sit between Tolweb and EOL to add phylogenies. So it would be nice to have a visual environment that can go and grab stuff from phylogenetic resources. It not only fetches the trees, but also fetch different resources for the decoration of the trees (EOL and others).
Idea: identify a standard for the visualization of phylogenies (so that different visualization tools use the same ontology/notation).
One approach is to define the use cases as the target point and then walk backward from there. First hackathon should be naming; identifiers is the main problem to be solved if we want interoperability;
R packages is another target – many different packages that do not play well together (read trees, visualize trees, manipulate trees).
Discussion points about different conceptions of hackathon
DIMENSION EDGINESS: 1. creative projects for demo/proof of concept software (that might not go anyhwer) 2. incremental fixing to production code (e.g., the round code of Mark)
SPONTANEOUS 1. open space principles 2. coordinate towards a well defined goal
IMPACT 1. long term community wide impact, promote best practices 2. short term gains for users
Lets go back to big pictures
1. Improve interoperability
2. Penetrate the stack technology
3. Grow participants (train, skills, network)
4. Demonstrate god practices in data sharing
Need to rank these aspects. 1 is the most important overarching goal. Just add edges among resources. 3 – is important and can be made happen no matter what.
Two views; one is to just establish connections between tools; maybe pick the most popular tools and work on them. The alternative is to create a momentum by getting as many developers and get them to adopt.
Shall we find the root causes for the lack of connections? Why new genome formats where adopted? They were good formats and it was imposed by the main repository. There are analysis tools for reconciliation, and visualization tools for reconciliations, but they do not talk to each other. Because there is no standard for exchange of data.
So why the links are not there? Awareness is a big problem. Constraints – nexus is used by many tools and this is an imposition
E.g., Bio::phylo is an intercoversion between nexml and nexus. Make nexml the common language to interchange between data formats.
Day 2 Discussions
agenda for Day 2: Hackathon planning
- 9:00: review from yesterday
- discussion from afternoon
- the emerging network
- types of hackathons
- 10:00: continue with grand strategy
- strategic goals
- 3-hackathon plan (can be open-ended)
- 1:30 decision for the first hackathon
- 2:00 plan for the first hackathon
- Clarification of topic and objective
- Clarification of duties and distribution of tasks
- Publicity and project tracking
- Broader plan for advertising
- 5:00: Shuttles back to hotel
reflections on previous day, what's important
Fish Mark:
- keep the eye on the big picture
- keep the eye on the applications and the users
- keep the eye on funding
Karen:
- Users and developers in the room; sometime it works sometime it does not
- If we have hackathon on a specific target, then we need community engagement in what the target is
- Common complaint from users – hackathons don’t really address immediate needs of the users. Can we do something that can have an immediate value to the users?
Brian
- We are trying to convince a large number of users to adopt something new (new format, new standards)
- To be successful we need a killer-app. It has to be slick and efficient. We are not really talking about what is the key app – we need to think about a major problem, slick that we are going to solve.
Sergey
- We are trying to do the right thing but everyone is coming in from a different view. Risk of not accomplishing much
- We need a strong leadership perspective: the core of the LT should push a specific idea.
- Also agree on the killer-app and there should be also something that target strong funding opportunities
Rutger
- Deja-vu from evoInfo working group (but with a different group of people and different types of expertise)
- Are we still dancing around the same issue that has not been solved?
- Like the BabelPhysh idea; it could be very useful if easy to use; especially in the context of developing workflows
- It is not pushing us to world domination but it could be one step forward.
Semantic Mark
- BabelPhysh is not the right solution. Centralizing is not a good solution.
- We do not have sufficient access to existing resources or code bases to get them to do the right thing by themselves
- We are taking a sideway approach to introducing a new and better standard; uncomfortable
- If we cast it as an intermediary step towards the introduction of a new standard, and then develop new standards with its tools, we may get an easier acceptance; we need to see what makes a new API easy to use.
Hilmar
- Long history with Hackathons and they have been enjoyable and they followed all a similar pattern – open space model, narrow the focus only in a limited way; they were coherent but open.
- At the same time, they are cumulative; they do not necessarily take us to products that people will follow up on and widely use
- This working group is not a continuation of NESCent Hackathons program, this is an unrestricted working group that can change the model; we can take new approaches and new directions.
- keep in mind how these standards will be adopted
- reduce the barriers to allow the use of the stack; the stack is good but nobody is using it. It is simple engineering at this point
- e.g., how to we get RaXML to use our stack? there are barrier in the way of adoption.
Arlin
- Two goals in the Hackathons; long term solution of interoperability and train people/build network. The second goal has been successful, but we have not made strides in the first one.
- We need intermediate strategies for moving forward; finding the killer-app, following the money are important aspects.
- Rethinking the Hackathon model from open space to a more top-down model with a dominating agenda is something that we should follow - that is why we have a LT team with diverse backgrounds.
Further Discussion
- What makes a standard successful?
- people must see the value
- are the big players involved?
- Need to find excitement in the evoIO stack; interoperability is an idea, not a need
- Maybe it would be better to build a big thing in the middle that harnesses all the data resources. Maybe a Universal Adapter?
- Remember that NEXUS failed because it was not expressive enough to say everything needed (and people started hacking it for this reason). The stack addresses this problem. This also means that a project that does round-trip of data will not be particularly useful (we will loose information as they cannot be captured by the existing formats).
- We often think about interoperability in terms of data providers/consumers; we can create an eco-system for tool writers (analysis, data management, visualizers) - if you write software according to certain standards, by default it will immediately be able to talk to many many other systems. Very compelling vision. That is why people use Mesquite - it is an eco-system where you can bring your data and do lots of things with it.
- Perhaps we should be the ones selecting the winner and then target the most used tools/repositories to make an impact.
- hackathon should produce reusable code. Don't focus only on the research question.
- Taverna and Galaxy are important lessons - they solve problem but not many people use them. How do we convince the community to use them? Again, PAUP was popular because you could do so much in it.
- Technologies in the stack are standards (W3C based). They allow access to use OTS tools.
- Use existing environments (Galaxy or whatever) and use them to solve an existing problem (e.g., reconcile tree problem). That will attract attention.
- Who do we target? Developers or Users? This is tough. The community of developers is very fluid.
- We are talking about taking files from format A to B. With journal requirements, we are seeing greater need of depositing files into repositories. Thus, for users NeXML is nice, it captures data and experiments and place them in a repository. Dryad is easy (no need for annotation), but what about TreeBase? Or even Dryad in a way that is reusable. Can we address that issue?
- good idea; opportunistic about looking for an impact
- but Dryad accepts everything, no community standards, thus the bar is very low
- there are on the other hand journals that will require TreeBase archiving. Furthermore, there is an incentive in data sharing; if the data are reusable, you may get citations - so there is an incentive to go beyond Dryad
- We need to tap into what users want to do. Interface phylogenies with other types of data. E.g., for education.
- A MIAPA support seems to be a good idea, it is something users may want to have (e.g., keep the parameters, the accession names).
- Visualization programmers may see the value of NeXML as they can maintain annotations in the file, useful for visualization.
- Consider a matrix of languages and components of the stack
- NeXML: ++ support in Perl, Java; some support in Ruby; low support in JS and in Python; Need to fill the table with ++
- Programming foundations are necessary for providers to serve NeXML data
Going back to the root of the problem
- what is the need for web services and phylogenies? We need to repeat the success of the Phylomatic approach - the creation of the big synthetic tree.
- publications are filtered inot APWeb 3, and Phylomatic can get the trees to the world
- the creation of the big tree in APWeb 3 is only for plants - it is not repeated for other domains
- the APWeb 3 is successful because it is very comprehensive
A proposed statement of principles guiding strategy
- we want to explore targeted, coordinated hackathons (not open-space anything-goes hackathons); we accept the likely reduction in edginess
- we want to improve on achievement of tangible outcomes by having an end-point goal for each hackathon that is completed by the last day (not by having followup as per the previous plan)
- we will not simply build a technology base in the hopes that people will use it: we will be opportunistic about leveraging the potential for high-impact problems, well funded problems, and immediate challenges faced by the user community
- when we address an immediate challenge, we will do so using technology with long-term potential (rather than just hacking a solution).
reviewing & evaluating list of possible hackathons
First we review the projects, then there are decisions to be made
- Order of the projects
- Impact
- on the community
- on the development of the stack infrastructure
- External resources needed
- Tree reconciliation problem - generalize and automate
- extend NeXML
- wrap-up applications in services
- BabelPhysh
- requires preliminary identification of the data formats
- ToL-o-matic
- Use Phylomatic, either using APWeb 3, or the mammals tree or the fish tree; Phylomatic is a success story, widely used.
- Wrap Phylomatic into a web service using PhyloWS (pruning, grafting)
- Add a name reconciliation service to validate species names submitted by the user (e.g., iPlant one)
- Add a re-calibration service (distances)
- Integration of the tree with other data (e.g., data from a table) - e.g., you have a table of data, extract species names, execute pruning and recalibration and then combine the tree with the other data in the table
- Add a scripted visualization at the end to show pictures, nearest specimens, EOL links
- MIAPA compliant submission support
- developers (e.g., MEGA, PAUP, Raxml) to add support in their programs to generate MIAPA metadata (part of NeXML or as a separate file)
- or annotation tool that allow to manually add the information to existing files (e.g., as part of BabelPhysh - pop out a window asking for the missing information)
- Tie NeXML to visualization - e.g., add to NeXML annotations that are guiding visualization (e.g., a visualization language that is embedded within NeXML)
- People need pretty pictures to put in the paper
- The more information we can provide in the file (e.g., not only a branch should be green, but explain why) the better
- Move back from a desired picture to identify what are the features that are missing in NeXML to generate the figure
Rank projects according to the four criteria
- 1. BabelPhysh:
- Order:
- Impact on community: ++
- External conditions:
- Impact on infrastructure: toolkits development for NeXML; implementation of some PhyloWS aspects;
- Prerequisites: list of data formats, test files
- 2. ToL-o-matic:
- Order: could come after 4
- Impact on community: +++
- External conditions: benefit from big trees available
- Impact on infrastructure: predominantly PhyloWS
- Prerequisites:
- 3. MIAPA compliant submission support
- Order: after 1
- Impact on community: +
- External conditions: Checklist from the TDWIG meeting; could benefit from stronger language support from MIAPA
- Impact on infrastructure:
- Prerequisites: very high; NeXML/CDAO development; also need PhyloWS support to handle MIAPA annotations
- 4. Vis language for NeXML
- Order:
- Impact on community: +++
- External conditions:
- Impact on infrastructure: high development for NeXML; also need some CDAO concepts
- Prerequisites:
- 5. Reconcile tree problem
- Order: after 2
- Impact on community: ++
- External conditions:
- Impact on infrastructure: predominantly PhyloWS; some CDAO (describe duplications, extensions, etc.)
- Prerequisites:
administrative (not scientific) planning of the first hackathon
- Dates:
- February 17th: posting of all the advertisements
- February 17th: all the invitations have been communicated
- March 4th: deadline for applications
- March 14th: deadline for scoring all the applications
- April 2nd: deadline for selection of applicants
- potentially first two weeks of June; options also at the end of July and end of August
- Prepare advertisement
- Mark prepares a 100-word blurb describing the event and announcing the dates for advertisements and deadlines
- Where to Submit and Who submits:
- Karen: NESCent newsletter
- Hilmar: NESCent Phyloinformatics mailing list
- Semantic Mark: OpenBio list and GMOD developers
- Arlin: Evoidr
- Brian: Facebook
- Fish Mark: BioSync
- Hilmar: EcoLog
- Press Releases:
- Create a "buzz" before the event (social media?); maybe a Google+ page
- Post on Blogs (a week before the event)
- Arlin: Rod Page's Blog
- Brian: Dechronization
- Develop surveys to engage community in the selection of features
- Karen: After the Hackathon contact Robin for a press-release
- Develop slides for presentations
- Participate in meetings and make presentations
- Phyloseminar online seminars
- #phylotastic
scientific planning of first hackathon
1. Scoping
- In Scope
- Populating data store of existing trees
- Evolution of PhyloWS to support the needs of Phylomatic
- Taxonomic name resolution (embedding existing TNRS capacities)
- Pruning trees and grafting species on them
- Branch length (existing methods for incorporating branch lengths)
- Integration of data and trees (e.g., mashups) - species-wise integration
- Display of resulting trees (using existing technologies)
- Wrap all these existing tools as web services
- NeXML syntax extensions if needed
- If needed, determine methods for compressing NeXML representations
- Simple user interface (web form)
- Not In Scope
- Constructing new input trees
- New Data Generation
- Arguing or evaluating the correctness of trees
- Design of new TNRS systems
- Debates about which naming system is best
- Developing new techniques to derive branch lengths
- Uncertain
- Phylo-referencing
- MIAPA annotations of the steps/provenance annotations
2. Mission Statement
- It's PhyloTastic, 'Trees for Everyone'
- At the First HIP Hackathon, an elite team of scientific programmers will
- make trees accessible and computable
- lower/remove barriers
Day 3 Discussions
Agenda: Wrap up, action items, closing thoughts
- 9:00: Leftovers from previous day
- another walk through the tol-o-matic project
- 10:30 - administrative issues
- wiki - organization and up-date
- publicity and advertising
- confidentiality
- 11:30 to 2:00 -- remaining issues
- strategy for HIP to become a bigger or longer-term group, develop its own brand and image, embark in other projects
- funding and other opportunities
- 2:00 Departures to airport begin (Enrico's flight at 16:00, Semantic Mark 17:25, Sergei 18:30)
Administrative Issues
- Enrico will create a Wiki page for the hackathon
- Karen will set up a mailing list for the participants to the Hackathon (HIP_hackathon)
- Mark will create a blurb for the event
- Brian will create on Googledoc an invitation letter for the hackathon (invited participants)
- Rutger will create on Googledoc an open call for the hackathon (open call for participants)
- Arlin will be in charge of dates and scheduling
Review of the ToL-o-matic Project
- Project statement:
Develop a collection of services to extract a subtree (specified by tips) from any large species tree, and to provide branch lengths, provenance annotations, and some other useful annotations.
Teleconferences
December 16, 2011
Agenda
- Mark's relocation to Spain - need to raise the issue but do not need to make any decision
- Preparing for the January f2f meeting.
- solidify good working relationships as a team
- develop a clear plan for the first hackathon (2012 spring or summer)
- clarify expectations (as needed) about type and amount of work each of us will do
- develop a provisional strategy for the whole 2-year 3-hackathon project
- commit to agreed-upon practices for info management, publicity, and project tracking
- Report on the PRF board meeting (Rutger)
Minutes present: Arlin, Enrico, Karen, Rutger, Sergei, Semantic Mark (notes by Karen, at least for first part of call; revised and completed by Enrico)
- Semantic Mark's move to Spain: how does this change things?
- Increased participant cost not so much an issue from NESCent's perspective
- hackathon in Spain? Mark W will think about finding a co-sponsor in Spain once he gets over there
- First f2f meeting in January
- Discussion of what should be the agenda of the meeting and what preparatory steps are needed
- The meeting should aim at organizing the first hackathon and lay the foundations for the successive ones
- Proposal indicates Data Resources as the topic of the first hackathon - but the group agrees that, with the full LT on board, we should spend some time reviewing the proposal and discussing whether we want to have a different order of activities in the hackathon. We will limit the discussion to half day.
- Everyone should re-read the proposal before the meeting
- Everyone should also start identifying potential participants associated to the main topics of the hackathons
- Discussion of what should be the agenda of the meeting and what preparatory steps are needed
- Agenda for the meeting
- Day 1:
- the morning is dedicated to brief introductions from the LT members - many of us have not met in person;
- the rest of the morning is dedicated to some informational presentations; possible talks:
- Hackathons: what are they, what they have and can accomplish, what works and doesn't, good practices
- The EvoIO stack technology and its current status
- Current trajectories of big projects like TreeBase, iPlant, and TOLWeb
- The whole afternoon is dedicated to the discussion of the grand strategy - review the proposal, decide topics and order of the hackathons
- Possibly dinner and drinks to socialize afterwards
- Day 2:
- Fully dedicated to the planning of the first hackathon
- Identification of topic and objective
- Clarification of duties and distribution of tasks
- Publicity and project tracking
- We have a wiki, but we need a broader plan for advertising: how do we reach out potential participants; we also need to produce press releases of the hackathons (for NESCent and other institutions). We may also want to have bigger aspirations, e.g., create a brand for the group that can continue beyond HIP
- We need a mechanism for project tracking (looking into software to do that)
- Another evening social event?
- Fully dedicated to the planning of the first hackathon
- Day 3:
- Dedicate the morning to discuss what is left over from day 2
- Discuss also the strategy for HIP to become a bigger group, develop its own brand and image, embark in other projects
- Day 1:
- Additional considerations
- Hilmar will create a Mendeley group to collect interesting readings for the group
- Discussion of the BRF (Rutger)
- Just completed the annual board meeting
- Governing body for data resources and projects related to phyloinformatics; overseeing TreeBase and TOLWeb
- In place for the last 2 years
- Currently developing its own brand and image
- Provides a GIT Hub for software projects
- Exploring funding strategies (e.g., donations in cooperation with existing societies); also exploring contacts with Sloan and McArthur foundations; already established as a non-profit
- Can be found at [1]
November 18, 2011
Agenda
- Responsibilities of the LT (brief review, no discussion-- we just want to hit the highlights)
- developing a long-range strategy
- choosing hackathon themes and dates
- recruiting partners and key participants
- advertising events
- soliciting and reviewing applications from participants
- managing hackathon events (with partners)
- mentoring participants in follow-ups to hackathon projects
- documenting and reporting results (wiki, publications)
- Accountability of the LT (to NESCent, to partners, to participants)
- see HIPPartners
- Management models. This leads to several questions:
- What will be the organizational structure of the group?
- How will we make decisions as a group?
- How do we expect responsibility for tasks to be distributed among members?
- How will we track activity, including progress on action items or goals?
- How will we maintain energy and nurture good working relationships?
Participants: Brian Sidlauskas, Arlin Stoltzfus, Sergei Kosakovsky Pond, Enrico Pontelli, Rutger Vos, Hilmar Lapp, Michael Rosenberg, Karen Cranston
Management Model
- Proposal was made and accepted to establish an Executive Committee (EC) composed of Arlin Stoltzfus, Rutger Vos and Enrico Pontelli (rationale is that group is too large to make all decisions)
- Duties of the EC
- Handle the mundane operations of the LT
- Make decisions for non-controversial issues and for other items as delegated by the LT
- Assumes the main responsibility for the management and administrative activities of HIP - e.g., administration, program management, project tracking, task tracking, communication.
- It is emphasized that the EC will focus on procedural issues, not strategic and scientific issues. The EC will operate within the authority given to it by the Leadership Team
- The EC will also investigate the use of project management software
- Note that the Leadership Team (LT) maintains the responsibility of making strategic and scientific decisions
- The LT had a discussion about the need for people to volunteer in leading the various tasks to be performed; volunteers will step forward based on their skills and preferences.
Decision-making Model
- Several strategic decisions will have to be made by the LT; e.g., selection of themes, invited participants, selection of applicants
- LT has decided to operate by consensus; we do not envision any controversial arguments to occur; if they do, we will resolve them through discussions; we will not expect the chair to break a tie or take sides in a stalemate.
- It was decided to use email for general discussions but to rely on either live conversations or a delegation model for the more controversial decisions
- in the delegation model, a chosen person has the authority to make the decision after getting input from the group
How to Maintain the Energy and Momentum
- Brief discussion about using Google+ or some other social network platform to aggregate ideas and maintain communications and discussions
- Alternative tools could include wikis, tweets, RSS feeds
October 28, 2011
Participants: Sergei, Enrico, Michael, Brian, Arlin, Mark (Fish), Mark (Semantic), Hilmar, Rutger absent: Karen (grant proposal)
Brief Introductions and Expectations of Leadership Team Member
- Hilmar:
- Desire to remove barriers, facilitate synthetic access to data
- Looking forward to interact with groups of people
- Hackathons are enjoyable, intense and rewarding; a great opportunity to sit down and develop software
- Sergei:
- Hope to get input from the field on how to enhance data interoperability in the broad biomedical field
- Gain different perspectives from the community (already learned a lot from previous hackathons)
- In fields like virology there are not good solutions for data exchange; lack of standardization; difficulty in making next generation sequencing data retrievable
- Michael:
- No experience with hackathons, but extensive experience as programmer and developer
- Desire to learn and be involved in the development of community standards
- Desire to see solutions to the problem of getting data from point A to point B to point C
- Brian:
- Experience in systematics and collection databases (more from the organisms perspective and less from the computational perspective)
- Desire to explore new ways to link information across collections
- Interested in contributing to the dialog on interoperation and helping the community
- Arlin:
- Experience in population genetics, molecular evolution and computational biology
- Experience in reuse data in unexpected ways and in combining data produced by different people
- As part of the NIST mission, interest in facilitating data exchange and interoperability
- Rutger:
- Extensive experience with hackathons
- Interest in development of standards (such as Nexml and PhyloWS)
- Interest in taking this technology stack (and other interesting technologies) to the next level and deploy them in different projects
- Mark (Fish):
- Interest in biodiversity, visualization, interactions, composition of different information sources
- BioSynC has already followed the NESCent model to bring together people with different expertise to create new solutions
- Interested in bringing the perspective of phylogenetics (of fishes) to the team as well as his programming expertise
- Mark (Semantic):
- Interest in interoperation of data and online resources; use of semantic solutions
- Interested in automating workflows through semantics
- Hope to bring expertise in semantically modeling complex data and complex data structures.
- Enrico:
- Interested in knowledge representation, ontologies, and automated reasoning
- Part of the team that created CDAO
- Interested in exploring ontologies and other knowledge representation and reasoning solutions to enhance interoperation and possibly moving towards automating workflows
Meeting Planning
- Tentative window for the face-to-face meeting of the leadership team is January 19-21, 2012
- Tentatively the meeting will include the leadership team and up to 2/3 external guests (possibly someone with expertise in data resources, as this will be the focus of the first hackathon)
Summary of Hackathons Plan
- Three hackathons:
- one at NESCent on Data resources
- one at the field museum in Chicago on Data integration
- one at Arizona State University on Data visualization
- Fundamental idea is to build links in a network of interoperable resources; possibly using existing technologies like Nexml, CDAO and PhyloWS; help identifying other relevant technologies, gaps and create new solutions.
General Discussion
- Discussion of possible Hackathon models
- traditional model of bringing 20-30 people for an intense and spontaneous programming experience
- people come knowing or not knowing each others
- initial period with informational talks to bring people up to speed with problems and resources
- following a proposal period where ideas are proposed and discussed
- then forming working groups (3-7 people) and work on specific problems
- initial idea is to perform a number of targeted invitations (e.g., 2/3rd of the participants) followed by an open call; applicants need to be screened to ensure they bring expertise and skills and they are collaborative.
- emphasize the importance of producing tangible results at the end of the 4-day effort
- In the HIP effort there are additional considerations
- The leadership team will organize not one event but three of them; questions about the connections among them (are they independent and each will explore fresh ideas? Is the second dependent on the results of the first?)
- There is a greater emphasis on follow-ups from each hackathon, trying to encourage continued activities and leading to concrete results (papers, proposals)
- Need to look at the results from the previous hackathons (in terms of how they have been organized) and learn from them
- For example, how open should the list of goals be? How many people are targeted invitations? How to constrain the initial open space to ensure that the ideas are not too abstract?
- traditional model of bringing 20-30 people for an intense and spontaneous programming experience
Concluding remarks
- Encourage the team to use the Wiki to share ideas about the organization structure of the hackathons
- Next teleconference on November 18th, at 12:00pm
August 29, 2011
(notes by Karen)
Agenda
The major tasks that the LT faces are to fill out its roster, start having regular teleconferences, and schedule the first meeting. According to past discussions, we still are short by 1 team member to represent ecology-biodiversity interests, ideally with a global perspective. Team members are likely to be more senior persons than hackathon participants, because we want someone with connections and a broad awareness of his or her field. An obvious course of action would be to recruit this person via an open call. We already have a strategy for this. We would edit the google+ ad to emphasize ecology-biodiversity, and recruit that last individual via a semi-formal application process.
In addition, Arlin would like for us to consider alternative ways that the LT might use to ensure that ecology-biodiversity interests are reflected in our projects.
Minutes
- present: Rutger, Enrico, Hilmar, Karen, Mark, Arlin
- initial unrelated hurricane discussion
- recap of past activities
- recruited Mark Wilkinson and Sergei Kosakovsky Pond
- third invite declined (too busy)
- first open call did not go out broadly enough
- still short one person with ecology / biodiversity interests
- should we still try and recruit another?
- Karen: would love to have someone from ecology, but we have spent much time on this process and feel we need to start moving forward with scheduling first meeting and hackathon planning
- Enrico: shares concern about moving forward; perhaps avoid another open call
- Mark: me too (agrees with Enrico); get ecologists involved at hackathons
- Rutger: worried about time required for open call; can we move forward with meeting planning and keep trying to find someone else
- Hilmar: on fence; limited diversity bothers him (demographic and scientific); skeptical about results from another open call, even if circulated more widely
- Arlin: really skeptical about finding someone in our Rolodex, especially about finding someone senior enough to have the time to be involved
- AS: how much enthusiasm for going ahead with open call?
- EP: still go ahead with meeting schedule
- can we find someone local who might be interested?
- KC: don't want to combine open call with private invitations
- need to re-write advertisement to be more specific about discipline and has time and connections for planning
- general agreement about sending out an open call (and doing it well)
- get open call out by Friday (i.e. text done by Friday)
- where? evoldir (Arlin), NCEAS (Hilmar), ecolog (Hilmar), Triangle area universities bio departments (Karen)
- what will we ask people to do and by when? have form, give people two weeks (10 days - September 12th)
- Mark: should we ask people to contact us rather than fill out the form?
- Hilmar: people may have questions, provide email address for that purpose
- what are we looking for?
- experience applying comparative methods in ecology; have made connection between ecology and phylogeny
- has big picture in mind - integrative science
- has connections in field; could effectively advertise within this community
- interested in working with experts to develop and apply tools; user or developer
- has time to commit to face to face planning meeting in fall, teleconferences approximately once per month, presence at hackathons (3 over 2 years)
- need to make this sound exciting for them (not just service on their part)
- what next after September 12th?
- choose candidates, teleconference sometime Sep 14-16
- discussion of web presence: should this be on EvoIO, NESCent Informatics site, separate NESCent wiki; no need to have a separate site,
Action items
- Hilmar: will take responsibility for getting text done for Thursday night
- everyone but Hilmar: make final comment and edits on Friday morning
- Arlin: send evoldir ad
- Karen: send to Triangle bio lists
- Hilmar: send to ecolog and NCEAS
- Rutger: welcome email to SemanticMark and Sergey and help ticket to add to hiplt mailing list
- Karen: doodle poll for next call (after Rutger sends invite mail)
- Hilmar: erase archives of hiplt list (where we accidentally discussed names of potential team members)
July 25, 2011
- filling out the leadership team
- what is our purpose in wanting to increase diversity? (diversity in disciplines, projects, social circles?)
- diversity important for outcomes, wrt disciplines, stakeholders, demographic (HL)
- goal to increase diversity of resources linked together, reach out, challenge assumptions (RV)
- demographic diversity (EP)
- do we want to recruit a 7th member in time for the Sept meeting? - yes, ideally
- what are we missing? what areas? (disciplinary-problem areas more important than methodology-approach)
- active work in comparative methods, users of trees (KC)
- ecology, systematics (HL, AS)
- biodiversity science, biomedical (HL)
- semantic linking, aggregation (EP)
- visualization (RV)
- do we have suitable candidates already? if not, how do we find them in time?
- Recruit candidate 1 (RV)
- Recruit candidates 2 & 3 (HL)
- Our plan of action
- RV resend advert
- all edit by Wednesday 27th July
- add focus on systematics - interest in reuse & interoperability; ideally non-molecular characters
- reply with a few sentences on what they hope to get out of it; what they can contribute
- arlin will make form for applications
- August 5 deadline
- taxacom (HL), tolweb contrib list (KC)
- August 8 telecon 11:30 am. - reschedule if decision is complicated
- cancel the meeting with NESCent (AS)
- what is our purpose in wanting to increase diversity? (diversity in disciplines, projects, social circles?)