For more than five years, the non-profit Organisation DataCite has been campaigning for easier access to research data and its improved traceability . The organisation’s aim is not solely to increase the visibility of research data on the internet, but also boost its acceptance as a relevant, citable component of an academic performance record.
One way to achieve this goal is via the registration of Digital Object Identifiers (DOIs) for research data. DataCite is an official registration agency for the International DOI Foundation. The twenty-four current DataCite members, which include ETH-Zurich’s DOI-Desk, enable their customers, so-called data centres, to register DOI names for research data that is accessible online.
Fig. 1: One step at a time towards DOI registration (slides on ETH-Bibliothek’s Slideshare account)
A DOI registration offers scientists a guarantee that the data they have published will remain permanently addressable via a stable identifier. They can add the published research data to their bibliography and afford it the appropriate significance in their track record. Other scientists are able to cite the dataset and refer to the data and its producer. In the ideal scenario, the standardised reference to a dataset in an academic publication results in the data producer noticing the new citation via tools such as Google Scholar, Data Citation Index or Impactstory and thus constantly being kept up to date with the re-usage of their data.
So much for the theory, anyway. In practice, we currently face the following challenges:
- The majority of research data is still not publicly accessible
- Research data is not described uniformly– if at all – using metadata
- To date, there is no accepted standard for the referencing and citation of research data
For this reason, DataCite has opened up another field of activity besides DOI registration to champion its cause: the development of standards and best practices for the publication of research data.
Describing research data
To date, established standards for the description of research data have only taken shape in a handful of disciplines, such as the social sciences. The development of a cross-discipline metadata scheme for research data was therefore one of the initial priorities upon the foundation of DataCite at the end of 2009. Such a scheme could also serve as a data exchange format for subject-specific research data archives and thus improve their interoperability.
The DataCite Metadata Schema, the first version of which was published in 2011, fulfils this aim. The scheme is honed constantly by DataCite and has largely established itself as a quasi-standard for the description of research data today.
Fig. 2: Obligatory metadata to be indicated within the DataCite scheme
One of its most important features is the possibility to describe semantic relationships between research datasets, their versions or components, and between publications and research datasets in a standardised way via specific metadata elements .
Based on the DataCite metadata scheme, DataCite recommends citing a research dataset as follows:
|Creator (PublicationYear): Title. Version. Publisher. ResourceType. Identifier
A practical example should make this even clearer:
|Swaminathan, R., Ramya, T., Karthik, C.S. (2013): Contortrostatin-Reprolysin Domain Structure. Swiss Institute of Bioinformatics. http://doi.org/10.5452/ma-c12zs
The importance of this recommendation lies less in how and in which order certain elements of the citation are listed than in the resulting demand for research data to even be cited on websites, in bibliographies or in scientific publications in a standardised form at all. In order to underline this matter, DataCite is also one of the signatories of the “Joint Declaration of Data Citation Principles”.
Publishing research data
In the collaboration with their customers, the operators of research data archives, DataCite members often have the possibility to work towards the implementation of important standards and showcase best practice examples. For instance, DataCite recommends that a DOI name should never resolve to a research dataset directly, but rather a so-called landing page.
Fig. 3: Example of a published dataset:
landing page for an anatomical 3D model of the IT’IS Foundation, http://doi.org/10.13099/ViP-Thelonious-V2.0
The landing page provides users with descriptive information on the dataset, enabling them to sort the data by content and re-use it in the interests of the data producer . DataCite has also issued recommendations on other topics, such as handling changing, so-called “dynamic data sets”.
Experiences from practice
In practice at ETH Zurich’s DOI-Desk, it is evident that the discussion on the publication of research data in Swiss research facilities is gaining in importance. In many areas, however, the technical and operational prerequisites to develop comprehensive services are still lacking. Often, the desire of potential DOI-Desk customers to publish existing data using DOI and render it citable takes priority in the initial contact. Frequently, however, the following prerequisites on the part of the customer have to be created first:
- A guarantee of the persistence and permanent availability of the digital objects
- The recording of metadata
- The provision of landing pages
As a result, the seemingly straightforward desire for the registration of DOI names frequently results in a longer advisory process about the requirements for electronic publication platforms for research data and other non-traditional publication formats. In an ideal scenario, the new DOI customer will have taken an important step on the path towards a trustworthy research data repository by the end of this process. And the DOI-Desk will have gained another customer, whose research data is easy to find and available for a global public to use.
 Jan Brase, Irina Sens, Michael Lautenschlager (2015): “The Tenth Anniversary of Assigning DOI Names to Scientific Data and a Five Year History of DataCite”. D-Lib Magazine 21(1/2). http://doi.org/10.1045/january2015-brase
 Joan Starr, Angela Gastl (2011). “is CitedBy: A Metadata Scheme for DataCite”. D-Lib Magazine 17(2). http://doi.org/10.1045/january2011-starr.
 Joan Starr, Eleni Castro, Mercè Crosas u.a. (2015). “Achieving human and machine accessibility of cited data in scholarly publications”. PeerJ Preprints 3:e697v4. http://doi.org/10.7287/peerj.preprints.697v4.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International Public License.
DOI Link: 10.16911/ethz-ib-1797-en