Persistent identifiers as a catalyst for open research data

For more than five years, the non-profit Organisation DataCite has been campaigning for easier access to research data and its improved traceability [1]. The organisation’s aim is not solely to increase the visibility of research data on the internet, but also boost its acceptance as a relevant, citable component of an academic performance record.

One way to achieve this goal is via the registration of Digital Object Identifiers (DOIs) for research data. DataCite is an official registration agency for the International DOI Foundation. The twenty-four current DataCite members, which include ETH-Zurich’s DOI-Desk, enable their customers, so-called data centres, to register DOI names for research data that is accessible online.


Fig. 1: One step at a time towards DOI registration (slides on ETH-Bibliothek’s Slideshare account)

A DOI registration offers scientists a guarantee that the data they have published will remain permanently addressable via a stable identifier. They can add the published research data to their bibliography and afford it the appropriate significance in their track record. Other scientists are able to cite the dataset and refer to the data and its producer. In the ideal scenario, the standardised reference to a dataset in an academic publication results in the data producer noticing the new citation via tools such as Google Scholar, Data Citation Index or Impactstory and thus constantly being kept up to date with the re-usage of their data.

So much for the theory, anyway. In practice, we currently face the following challenges:

  • The majority of research data is still not publicly accessible
  • Research data is not described uniformly– if at all – using metadata
  • To date, there is no accepted standard for the referencing and citation of research data

For this reason, DataCite has opened up another field of activity besides DOI registration to champion its cause: the development of standards and best practices for the publication of research data.

Describing research data

To date, established standards for the description of research data have only taken shape in a handful of disciplines, such as the social sciences. The development of a cross-discipline metadata scheme for research data was therefore one of the initial priorities upon the foundation of DataCite at the end of 2009. Such a scheme could also serve as a data exchange format for subject-specific research data archives and thus improve their interoperability.

The DataCite Metadata Schema, the first version of which was published in 2011, fulfils this aim. The scheme is honed constantly by DataCite and has largely established itself as a quasi-standard for the description of research data today.


Fig. 2: Obligatory metadata to be indicated within the DataCite scheme

One of its most important features is the possibility to describe semantic relationships between research datasets, their versions or components, and between publications and research datasets in a standardised way via specific metadata elements [2].

Citing metadata

Based on the DataCite metadata scheme, DataCite recommends citing a research dataset as follows:

Creator (PublicationYear): Title. Version. Publisher. ResourceType. Identifier

A practical example should make this even clearer:

Swaminathan, R., Ramya, T., Karthik, C.S. (2013): Contortrostatin-Reprolysin Domain Structure. Swiss Institute of Bioinformatics.

The importance of this recommendation lies less in how and in which order certain elements of the citation are listed than in the resulting demand for research data to even be cited on websites, in bibliographies or in scientific publications in a standardised form at all. In order to underline this matter, DataCite is also one of the signatories of the “Joint Declaration of Data Citation Principles”.

Publishing research data

In the collaboration with their customers, the operators of research data archives, DataCite members often have the possibility to work towards the implementation of important standards and showcase best practice examples. For instance, DataCite recommends that a DOI name should never resolve to a research dataset directly, but rather a so-called landing page.


Fig. 3: Example of a published dataset:
landing page for an anatomical 3D model of the IT’IS Foundation,

The landing page provides users with descriptive information on the dataset, enabling them to sort the data by content and re-use it in the interests of the data producer [3]. DataCite has also issued recommendations on other topics, such as handling changing, so-called “dynamic data sets”.

Experiences from practice

In practice at ETH Zurich’s DOI-Desk, it is evident that the discussion on the publication of research data in Swiss research facilities is gaining in importance. In many areas, however, the technical and operational prerequisites to develop comprehensive services are still lacking. Often, the desire of potential DOI-Desk customers to publish existing data using DOI and render it citable takes priority in the initial contact. Frequently, however, the following prerequisites on the part of the customer have to be created first:

  • A guarantee of the persistence and permanent availability of the digital objects
  • The recording of metadata
  • The provision of landing pages

As a result, the seemingly straightforward desire for the registration of DOI names frequently results in a longer advisory process about the requirements for electronic publication platforms for research data and other non-traditional publication formats. In an ideal scenario, the new DOI customer will have taken an important step on the path towards a trustworthy research data repository by the end of this process. And the DOI-Desk will have gained another customer, whose research data is easy to find and available for a global public to use.

[1] Jan Brase, Irina Sens, Michael Lautenschlager (2015): “The Tenth Anniversary of Assigning DOI Names to Scientific Data and a Five Year History of DataCite”. D-Lib Magazine 21(1/2).

[2] Joan Starr, Angela Gastl (2011). “is CitedBy: A Metadata Scheme for DataCite”. D-Lib Magazine 17(2).

[3] Joan Starr, Eleni Castro, Mercè Crosas u.a. (2015). “Achieving human and machine accessibility of cited data in scholarly publications”. PeerJ Preprints 3:e697v4.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International Public License.

DOI Link: 10.16911/ethz-ib-1797-en

  1. kyrie 5 shoes
  2. curry shoes
  3. cheap jordans
  4. goyard
  5. curry 6 shoes
  6. air force 1
  7. balenciaga
  8. supreme clothing
  9. curry 7 shoes
  10. balenciaga
  11. moncler
  12. russell westbrook shoes
  13. yeezys
  14. a bathing ape
  15. jordan shoes
  16. kyrie 3
  17. calvin klein outlet
  18. kd shoes
  19. jordans
  20. retro jordans
  21. supreme new york
  22. balenciaga sneakers
  23. lebron 15
  24. kobe shoes
  25. lebron james shoes
  26. off white hoodie
  27. jordan 4
  28. golden goose outlet
  29. outlet golden goose
  30. goyard
  31. hermes birkin
  32. nike react
  33. kd shoes
  34. kyrie 3
  35. off white
  36. supreme hoodie
  37. goyard tote
  38. cheap jordans
  39. jordan 6
  40. jordan 13
  41. golden goose
  42. stone island outlet
  43. kd 13
  44. yeezy supply
  45. supreme
  46. supreme clothing
  47. lebron shoes
  48. goyard handbags
  49. moncler jackets
  50. supreme
  51. golden goose
  52. supreme clothing
  53. golden goose
  54. yeezy
  55. yeezy
  56. yeezy boost 350
  57. moncler outlet
  58. supreme
  59. longchamp handbags
  60. off white
  61. lebron shoes
  62. steph curry shoes
  63. kobe basketball shoes
  64. adidas yeezy
  65. off white x nike
  66. yeezy 350 v2
  67. supreme t shirt
  68. birkin bag
  69. yeezy 380
  70. yeezy boost 350
  71. yeezy supply
  72. yeezys
  73. golden goose
  74. lebron james shoes
  75. kd 13
  76. curry 6 shoes
  77. hermes
  78. golden goose sneakers

Leave a Reply

Your email address will not be published.