2013 Rubenstein Competition


This international program seeks to support the use of EOL resources in biodiversity research by scientists. Over the past two months, EOL has solicited ideas from the research community about particular areas in which EOL content, properly queried, could illuminate large scale questions in biology and advance biological research. Eleven specific requests (below) were advanced by the research community and selected by the EOL curator community as the most promising, tractable and potentially impactful projects. We are calling for proposals to extract from the EOL collection the data which is described, and advance the corresponding inquiry in one or several of the Wishes below.

Applicants must indicate the Wish (or Wishes) targeted by their proposal, describe the plan and methodology for extracting the requested information, and detail the structure and format of the dataset or datasets which will be delivered. 


All proposals must have a single primary investigator, to be known as the Fellow. Collaborative proposals are allowed and encouraged, and any subawards to other participating collaborators should be included in the budget. The Fellow will be responsible for project reporting, but communication of all proposal participants with EOL staff is encouraged.

Any method is acceptable, provided it is transparent and replicable. Crowdsourcing? Semantic reasoning? Automated image analysis?  Yes. You've thought of a method we haven't? Even better. A strong proposal will be led by a Fellow with solid professional credentials in the area of the proposed methodology. Formal affiliation with an academic or research institution is not required. 

Awards will be administered as contracts with the Smithsonian Institution. Work may take place anywhere; regular communication with EOL staff will be required.

Original Wish authors are eligible to lead or participate in a proposal. Current EOL staff may provide advice but may not be supported by a project budget.

International applicants who need assistance in interpreting the details of the application instructions, please contact us.

Research Wishes

Wish Hints and guidance Author
I wish that I could use the following EOL or BHL content: text from IUCN sources, also brief summaries,  morphological, habitat, behavioural chapters describing traits of species, and phylogenetic classifications, to obtain the following data: vulnerability to climate change, similarity of vulnerable species in their different traits and their phylogenetic similarity, in order to answer the following biological research questions:

What are the most important physical or behavioural traits associated with vulnerability to climate change in organisms already assessed by the IUCN, and can these results be extended to organisms that have not yet been assessed? What is the relative speed of evolution in those traits that are important for adaptation of species to global change?
* Text mining?
* EOL+ IUCN, Open Tree of Life? 
Visit the discussion
Barbara Bauer
I wish that I could use the following EOL or BHL content: text, including image captions and morphological descriptions on EOL pages, specifically those that mention the regeneration ability (or the lack of it) to obtain the following data: the degree of regeneration (perfect, partial, none), of the body part that was tested for regeneration, in order to answer the following biological research question:

What is the distribution of the ability to regenerate lost body parts across animal phyla?
* Text mining?
* Semantic reasoning?

Visit the discussion

I wish that I could use the following EOL or BHL content: multiple classifications harvested by EOL, to obtain the following data: degree of coverage and congruence among hierarchies and nomenclatures in order to answer the following biological research question:

What organisms are understudied taxonomically (i.e., are represented in few classifications) and what is the distribution of agreement/controversy across the branches of the tree of life?

Visit the discussion

I wish that I could use the following EOL or BHL content: localities mentioned in text from EOL distribution chapters, and localities and taxon names from BHL pages, to obtain the following data: verbal localities for species occurrence, indexed by a gazetteer to coordinate localities, in order to answer the following biological research questions:

Can errors in specimen or observation point data be detected by comparison with verbal localities? What are the patterns of different error types (sign errors, transposition of coordinates, transposition of digits...)?
* Text mining?
* Gazetteers?
* EOL + GBIF, OBIS, DiscoverLife?

Visit the discussion

I wish that I could use the following EOL or BHL content: photographs of species, records in gbif, and physical descriptions of organisms and distributions, to obtain the following data: detectability and popularity data for organisms that would aid in assessing true population sizes from field records, in order to answer the following biological research questions:

What are the characteristics of plants and animals that make them most likely to be recorded by people?
* Text mining?
* EOL + GBIF, online occurrence reporting platforms?

Visit the discussion

I wish that I could use the following EOL or BHL content: text fields that contain other species names or specific keywords related to interactions (e.g., 'feeds on'); partner information and harvest history for partners providing association information to EOL, to obtain the following data: interspecific associations data formed into a network (similar to a social network), that can then be analyzed for connectivity using parameters like centrality, density and cohesion; for each association, association type, source, and EOL chapter, in order to answer the following biological research questions:

Does species A interact with more organisms than species B? Are the organisms in ecosystem A more interconnected than organisms in ecosystem B? How important are macro-species in global ecosystems? Can their presence explain patterns in microbial interaction networks?  What about vice versa? What is the most efficient way to extract existing knowledge about these interactions from the biological literature and the biological community and make it available in a useful fashion on the EOL platform? 
* Text mining?
* Semantic reasoning?
* Network visualization?

Visit the discussion

tanager, athessencsparr, gilbertjack
I wish that I could use the following EOL or BHL content: data on species characteristics (e.g. "Facts" sections such as morphology, diet, etc), as well as georeferenced records of species throughout their current range  (in particular, these records would need to include recent suburban / urban observations, not just historical ranges) to obtain the following data: species occurrence records; species traits, in order to answer the following biological research questions:

What species occur in urban and suburban areas, and what is the subset of species that thrive in urban areas (e.g. urban adapters / exploiters or "synurbic" species)?  What are the biological attributes of species that thrive in urban and suburban environments?  Of particular interest are terrestrial species with limited dispersal ability.
* Text mining?
* EOL + GBIF,   Map of Life, online occurrence reporting platforms?

Visit the discussion

I wish that I could use the following EOL or BHL content: IUCN status; Threats, Management, ConservationStatus, Procedures, Distribution and Habitat chapters, and Maps, to obtain the following data: species' conservation status; current data requirements; location; estimate of species detectability, in order to answer the following biological research question:

Which species of conservation concern could be most usefully targeted for student or citizen science research expeditions?
* Text mining?

Visit the discussion

I wish that I could use the following EOL or BHL content: all relevant databases and museum records, to obtain the following data: geographic species range data for all non-vertebrate animal phyla that are found on coral reefs and are cosmopolitan throughout the tropical Indo-Pacific, in order to answer the following biological research question:

Approximately how many coral reef associated marine invertebrate species are widespread across the tropical Indo-Pacific?
* Text mining?
* EOL + OBIS? 

Visit the discussion

John Horne
I wish that I could use the following EOL or BHL content: images and chapter text (eg: look-alikes) to obtain the following data: known mimicry species pairs and groups, and additional information (ecology, distribution, habitat, phylogeny and evolution) in order to answer the following biological research question:

What are the biological correlates of mimicry relationships?
* Text mining?

Visit the discussion

I wish that I could use the following EOL or BHL content: images, morphological descriptions, and habitat information from many EOL partners and BHL articles, to obtain the following data: colour of organism, altitude and depth information, in order to answer the following biological research questions:

Is blue coloration more likely to occur in high altitude or shallow depth plant and animal species? Is use of red pigment in flowers and berries conserved or derived across many clades of plants?
*Text mining?
* Image analysis?

Visit the discussion

 Katja Seltmann


Dataset: A successful proposal must deliver a dataset as described in one of the Wishes above. If your method can be leveraged to provide data for more than one Wish, multiple datasets can be proposed and this will strengthen the proposal. 

Applicants are encouraged to consult Wish authors and seek input from other potentially interested researchers in order to guide their planning. Interested EOL community members will be monitoring the discussion links above. Applicants are encouraged to post ideas and seek advice on particular Wishes there as well as in the general newsfeed of the Research Ideas community. A strong proposal should attract the interest of potential consumers of the dataset. Letters of support from researchers who would be interested in using the proposed dataset are encouraged. 

If additional content not yet in EOL but appropriate for our collection is needed for the project, it should be deposited during the course of the project. A letter of support from the owner should be included with the proposal, unless the resource is Public Domain.

Technical assistance will be available from EOL staff for content import and export, for candidates who may require it. Proposals should say explicitly whether this support will be needed or whether the candidate plans to execute data migration using existing EOL protocols and webservices

Analysis: A basic proposal may plan to simply deliver the dataset, which will then be made available to all interested parties for analysis. It is permitted, but not required, that the proposal plan also include analysis and/or visualization of the data in order to advance the research question. In this case, it is also expected that the analysis and results will be made public, either in a scholarly publication or in some public venue online. Such a proposal must also include a commitment from a qualified researcher to participate. The researcher's effort may be included in the budget or not, but either way a letter of support should describe their plans to work with the data. If analysis is included in the proposal, the raw data must still be made publicly available by the end of the project.

Documentation: A data-delivery-only proposal must commit to providing documentation of the method so that it can be reproduced for another dataset by a third party if needed. The methods should include evaluation methods to estimate success rate. A proposal including analysis must also document analysis methods in the published results.

Public communication: All proposals must include a plan for updating the interested public on the progress of the work. Blog posts are recommended, either contributed to appropriate community blogs, to the EOL blog, or in your own independent blog, but other appropriate formats will be considered. Brief updates should be made at least monthly and public comment and dialogue should be supported. These updates will be further disseminated through EOL social media.

Licensing: Unless otherwise declared to be public domain, the dataset produced and any other information generated by a Fellow engaged in supported activities is owned by the Fellow and expected to be published under an accepted license, and deposited at EOL if the data type is appropriate. New code developed should be published open source under an MIT license.


Proposals will be reviewed by qualified researchers in the biological and information sciences. Award notifications are expected by January 2013. Support or recommendations both from colleagues familiar with the proposer's work and from potential dataset users will be an important factor in the review process. 


Total proposal budgets should not exceed US $50,000. The funding may be spread out over up to ten months. Start dates may vary, but may not precede March 1, 2013 nor begin later than October 2013.

Benefits are not provided as part of an EOL Rubenstein Fellows award. Fellows can choose to use a portion of their effort compensation to pay for health benefits, but funds for benefits cannot be requested as part of a budget. Also not provided or allowed are any funds used towards institutional overhead. Home institutions of successful Fellows will not administer funds and cannot extract overhead. Funds are independently administered as contracts to successful Fellows.

Open Access publishing fees are allowable expenses. Travel and equipment expenses are allowable if justified. Purchased equipment and unconsumed supplies are owned by Fellows following the term of the project.

Award Details

Funding will be administered directly to the Fellow as a contract. The budget should be calculated using hourly rates and expected effort, as well as non-personnel expenses. However, invoicing will be based on milestones and then deliverables as the project progresses.

Fellows will be required to maintain regular communication with EOL staff regarding progress towards proposal goals and any changes. They are expected to be available by email, or phone if preferred, and to promptly advise EOL staff of any unforeseen changes to the project schedule or expected results. Failure to communicate effectively or meet progress goals may result in termination of funding.

Application process

All proposal materials must be submitted by 11:59pm GMT, November 15th, 2012.

Each applicant must submit the following materials in English, by email, to the EOL Fellows coordinator at hammockj AT si.edu:

  1. An application package, including a project description (up to 5  pages), a one page literature cited supplement, if needed, and a budget. The project description should provide a detailed description of proposed activities and products, including: existing EOL content to be leveraged, new resources (if any)  to be added to the EOL collection, outside data sources to be used (if any), detailed methods including evaluation methods, and structure, format and expected size and scope of the dataset produced. The budget should detail the proposed effort of each participant and provide a justification of any equipment or travel expenses requested.
  2. Resumé: a resumé or CV should be included for all participants whose effort is included in the project budget. For proposals including analysis, a CV should be included for each participating researcher whether supported in the budget or not.
  3. Letters of recommendation: Applicants should request at least one letter of recommendation or up to three letters of recommendation from clients or professional colleagues detailing your expertise, the quality of your body of work, and the quality of your proposal. Letters should not exceed one page. Potential Fellows may not submit letters of recommendation as part of their application package. These should be emailed directly by the recommender and name the candidate Fellow in the subject line.
  4. Letters of support: Applicants should also request up to five letters of support from potential users of the proposed dataset. For proposals including analysis this should include at least one letter from a researcher committed to participating in the proposed analysis, and may include others from interested potential users.

Please see the Wishes for Research FAQ for advice about estimating the current scope and coverage of EOLcontent. Direct questions about proposals or the Fellows program to the EOL Fellows Coordinator at hammockj AT si.edu.

Please see the EOL Research Ideas Community for current discussion and concerns previously raised.

The Encyclopedia of Life has partnered with CRDF to help administer the EOL Rubenstein Fellows program. Established in 1995 by the National Science Foundation, CRDF (www.crdf.org) promotes international scientific and technical collaboration through grants, technical resources, and training.