Unique IDs for individual (digital) specimens from natural history museums streamline and future-proof science

03 Oct 2024, 16:23 by Pensoft Publishers

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

fact-checked

peer-reviewed publication

trusted source

proofread

The jumping spider Chrysilla lauta, one of the species studied in the first scientific publication to use digital specimen DOIs. Credit: Melvyn Yeo

The wealth of data hosted in natural history collections can contribute to finding a response to global challenges ranging from climate change to biodiversity loss to pandemics. However, today's practices of working with collected bio- and geodiversity specimens lack some efficiency, which limits what scientists can achieve.

In particular, there is a serious lack of linkages between data centered around specimens and coming from various databases (e.g. ecological and genomic data), which poses significant obstacles when researchers attempt to work with specimens from multiple collections.

Now, a publication in the open-access Biodiversity Data Journal becomes the first to demonstrate novel workflows to further digitize and future-proof biodiversity data. The paper updates the knowledge about two genera of jumping spiders from across two collections and describes a newly discovered species by utilizing novel workflows and formats: digital specimen DOIs and nanopublications.

Persistent identifiers: The DOIs

Several initiatives have been launched in recent years to establish a globally accepted system of persistent identifiers (PIDs) that guarantee the "uniqueness" of collection specimens—physical or digital—over time.

You can think of a PID as a marker, an identifier that points at a single individual object and only one, differentiating it from any other in the world. You must have heard of acronyms such as ISBN or ORCID. Those are PIDs for books and individual scholars, respectively. For digital research content, the most widely used PID is the DOI (Digital Object Identifier), proposed by the DOI Foundation.

A DOI is an alphanumeric code that looks like this: 10.prefix/sufix

For example, if you type "doi.org/10.15468/w6ubjx" in your browser, you will reach the Royal Belgian Institute of Natural Sciences' mollusk collection database, accessed through GBIF. This specific DOI will never point at anything else, and the identifier will remain the same in the future, even if changes occur in the content of this particular database.

DiSSCo and the DOIs

The Distributed System of Scientific Collections (DiSSCo) aims to provide a DOI for all individual digital specimens in European natural history collections. The point is not only to accurately identify specimens. That is, of course, crucial, but the DOI of a digital specimen provides a number of other advantages that are extremely interesting for DiSSCo and natural history collections in general. Among them, two are simply revolutionary.

Firstly, using DOIs allows linking the digital specimen to all other relevant information about the same specimen that might be hosted in other repositories (e.g. ecological data, genomic data, etc.).

In creating this extended digital specimen that links different data types, digital specimen DOIs make a huge contribution to inter-institutional scientific work, filling the gap that is described at the beginning of this piece. Now scientists will be in a much better position to really exchange and link data across institutions.

Second, in contrast to most other persistent identifiers, the DOI of a digital specimen stores additional metadata (e.g. name, catalog number) beyond the URL to which it redirects. This allows access to some information about the specimen without having to retrieve the full data object, i.e. without having to be redirected to the specimen HTML page.

This metadata facilitates AI systems to quickly navigate billions of digital specimens and perform different automated work on them, saving us (humans) precious time.

Use of DOIs in publications

With all this in mind, it is easier to understand why being able to cite digital specimens in scholarly publications using DOIs is an important step. So far, the only DOIs that we could use in publications were those at the dataset level, not at the individual specimen level.

In the example above, if a scientist were to publish an article about a specific type of bivalve in the Belgian collection, the only DOI that she or he would have available for citation in the article would be that of the entire mollusk database -containing hundreds or thousands of specimens- not the one of the specific oyster or scallop that might be the focus of the publication.

The publication in Biodiversity Data Journal about the Chrysilla and Phintelloides genera is the first of its kind and opens the door to citing not only dataset-level objects but also individual specimens in publications using DOIs.

You can try it yourself: Hover over the DOIs that are cited in the publication and you will get some basic information that might save you the time of visiting the page of the institution where the specimen is.

Click on it and you will be taken to DiSSCo's sandbox -the future DiSSCover service- where you will find all the information about the digital specimen. There you will also be able to comment, annotate the specimen, and more, thus making science in a more dynamic and efficient way than until now.

More information: Christa Deeleman-Reinhold et al, The genera Chrysilla and Phintelloides revisited with the description of a new species (Araneae, Salticidae) using digital specimen DOIs and nanopublications, Biodiversity Data Journal (2024). DOI: 10.3897/BDJ.12.e129438

Journal information: Biodiversity Data Journal

Provided by Pensoft Publishers