EOL Phylogenetic Tree Challenge

Background

Many biologists seek a well-supported tree of all life that reflects our best understanding of evolutionary relationships among organisms (see this recent review).  Such a tree would provide unprecedented ability to advance our knowledge in biological fields such as developmental genetics, comparative ecology, biogeography, and other areas where pattern and process must be studied in an evolutionary context. Such a tree would also provide a meaningful framework for organizing knowledge about organisms so that it is readily discovered and used by all audiences.

Encyclopedia of Life wishes to give its users phylogenetically-informed ways to browse and retrieve information from its pages.  While we are not yet enabled to manage phylogenies or their associated data in any explicit way, EOL does provide the ability for users to navigate taxon pages according to multiple classifications. These are considered browsing hierarchies (see a list). While some of these classifications aim to be comprehensive, none are yet exhaustive and none of these hierarchies adequately reflect the current state of phylogenetic knowledge.

This contest has two core purposes:

  •  It provides a testbed for the Evolutionary Informatics community to develop robust methods for producing, serving, and evaluating large, biologically meaningful trees that will be useful both to the research community and to broader audiences.
  • It enables the Encyclopedia of Life to organize the information it aggregates according to phylogenetic relationships; in other words, it provides a direct pipeline from research results to practical use.

The challenge

A prize is offered to the individual or team that can provide a very large, phylogenetically-organized set(s) of scientific names suitable for ingestion into the Encyclopedia of Life as an alternate browsing hierarchy. 

  • Names must be provided in Darwin Core Archive1 format.
  • Extinct organisms may be treated but are not required.
  • Ranks are not required for names but may be included.
  • Internal nodes need not all have formal Linnaean names but require a label which can be arbitrary.  Leaf nodes also need not have formal names but ideally most will overlap with current EOL species pages.
  • For the purpose of this contest, metrics and source of node support, branch lengths, vernaculars, and synonyms are not required.  These may be included; not all are currently displayable on EOL. 
  • Trees must be rooted. Multiple, overlapping hierarchies may be submitted as a set (e.g. to handle reticulation) but within in each file a name cannot have more than one parent.

Among other factors, the total number of uniquely named nodes, node/leaf ratios and tree height may be used to compare entries so contestants should consider how they wish to trade off strict consensus versus other methods of reflecting the state of phylogenetic knowledge.

Problems to solve include 1) how to assign labels to unnamed nodes, 2) how to fill in gaps so that the set of taxa included is as comprehensive as possible, even if trees are not fully resolved or all taxa have not been analyzed, 3) how to handle competing hypotheses, 4) how to update the hierarchy at least annually.  

The winning submission must be available to EOL and others under an acceptable CC license if it is under copyright.  The tree need not be previously published in peer-reviewed form.

Questions about the challenge may be asked in the Phylogenetic Tree Challenge community on EOL.

Submission process

Contestants shall submit their DwC-A data file(s) online to the Global Names project at http://postbox.globalnames.org.  You will first upload your file and receive email when the file is processed and ready for preview.  If the basic statistics are not what you expect, you may need to check your file and try again. Please ask for help in the Phylogenetic Tree Challenge community. You will have a limited time to finalize a submission by providing metadata about your file. Once a submission is finalized, it will be publicly browsable and downloadable with all associated attribution and licensing you specify.

Please submit as early as possible in order to get feedback and make sure your file can be processed. You may make a new submission at any time before the deadline and we will ignore your older submissions.  We encourage contestants to combine efforts to improve the final trees.

Contestants should also send a 2 to 5 page explanation of their methods, citation of original sources, and how they have addressed the four key problems, including how they expect that the tree can be updated at least once a year. If you are submitting several files for one challenge entry, please indicate this in your explanation. This explanation should be sent to eol.tree.challenge@gmail.com.  Submissions without an associated explanation by the final submission deadline will not be considered.  

Deadlines

The submission period opened 20 February 2012. Final submissions are due by 15 April 2012.

Judging

A panel consisting of taxonomic and informatics experts both internal and external to EOL will evaluate submissions and award the prize.  As noted above, the panel will consider factors such as completeness, scientific rigor, and reproducibility. These will be apparent both in the explanation of the solutions to the four key problems and in the data files themselves, which will be checked for size and consistency with currently published results in known problem areas.

Prizes and other benefits

A grand prize winner will receive an all-expenses paid trip to iEvoBio 20122  courtesy of the Encyclopedia of Life. The successful tree will be published, with attribution, on EOL as an alternate browsing hierarchy. EOL will issue a press release to highlight the contest and its winner.   Other submissions may be published on EOL by mutual agreement of authors and EOL.  All contestants are encouraged to submit manuscripts describing their entry to peer-reviewed journals. Support for open access fees may be available from Encyclopedia of Life.

The Global Names project will award a second 2012 iEvoBio attendance prize to the project that shows great promise to scale to all clades and/or that is capable of responding quickly to new phylogenetic insights.  The number of terminal elements in the submission will not be a major consideration. Instead, GN will reward the approach that shows the greatest promise of scaling to become a comprehensive tree. GN will work with this prizewinner to help them realize their vision.

Eligibility

EOL will provide transportation, lodging, registration, and per diem costs as allowed by U.S. Federal travel guidelines.  International applicants are eligible.  If entry is submitted as a team, travel by only one member of the team will be supported.  While acceptance in the program of iEvoBio 2012 cannot be guaranteed, submission of a full paper is strongly encouraged.

Additional information

To ask questions or discuss this challenge, please join the Phylogenetic Tree Challenge community on Encyclopedia of Life.

A related challenge has been announced for the iEvoBio conference, focusing on Synthesizing Phylogenies. Though their goals are somewhat different, the same entry may (or may not) meet requirements for both challenges. We are in close communication with iEvoBio and encourage discussion about how to leverage these opportunities.

1You may use any method to prepare your Darwin-Core-Archive-formatted submission.  Tools that are known to output EOL-ingestible Darwin Core files include:http://www.lifedesks.org and http://gnite.org. GBIF's Darwin Core Archive Assistant may also be helpful.

2iEvoBio 2012 is held in association with the Evolution 2012 meetings in Ottawa, Canada.