The circular economy of research data: Development of Databank and Data Catalogue

Increasingly efficient use of existing research data saves time and money. By preserving and sharing research data, we promote their circularity and enable more multidisciplinary and better quality research. By communicating valuable data, we strengthen the reputation of both researchers and the university.

For this purpose, we created the University of Helsinki Databank for storing research data and the data catalogue for describing data and making them findable. We promote the circular economy of research data by supporting the realisation of the FAIR Principles (Findable, Accessible, Interoperable, Reusable). The data model we have designed complies with metadata standards, making data findable for machines. Datasets are assigned persistent digital object identifiers (DOIs) to enable reliable citing. In addition, our service process includes data curation, which helps researchers preserve and utilise valuable research data efficiently.

Databank was created in response to a need from researchers

The development of a databank began in spring 2022 on the basis of a survey on the storage needs for research data. Researchers sought better solutions for data storage after project conclusion, which led to the decision of the University of Helsinki to invest in storage and curation. In 2023, we launched a project entitled Long-term storage of research data for the University of Helsinki (TPAT), under which Helsinki University Library, the IT Centre and Research Services developed a new storage service. Besides a technical solution, the goal was to develop the curation of data.

From TPAT project to Databank

In spring 2024 the Databank was opened for University of Helsinki researchers, enabling the storage of research data for 5 to 15 years. We developed an order form and management tool for managing service requests and storage decisions. The metadata (title, abstract, description of value, etc.) of datasets constitute an integral part of Databank curation, as faculty-specific research committees or separately appointed databank committees decide on the granting of storage space on the basis of these details.

The Databank provides a solution for storing large datasets for 5 to 15 years, freeing up group storage drives and other more expensive storage space intended for active data processing. The Databank makes it possible to make large datasets available with the help of metadata, even without direct download links. In fact, this is the best way to access large datasets that take time to download from the web. As the Databank has no public user interface, a data catalogue was needed to make it easier to find and request data. 

Research data catalogue and search service

The data catalogue was developed for the management of research data stored in the Databank, serving as a record of metadata for research data produced at the University of Helsinki. Its purpose is to provide an overview of the University’s research data as well as to enable searches for data, data browsing and the production of metadata for research data. The data catalogue collects data on research data produced by the University community and located on various storage platforms.

The data catalogue is like a library catalogue that helps find and manage research data. It does not include actual data generated through research, but their descriptive data, or metadata. These details enable users to determine which datasets are available and where, making it possible to use and cite them.

New service based on familiar tools

The development of a data catalogue was initiated immediately after the launch of the Databank. Unlike the Databank, the data catalogue was designed as a Helsinki University Library project without external technical collaboration partners. The IT platform was chosen from among currently available software to meet the University’s needs. Ultimately, a decision was made to use the DSpace publishing platform, which was already in use, for example, in the open digital repository Helda. The use of an existing tool enabled smooth development and, following the project, continuous management by the library’s specialist team.

A key feature of the data catalogue is automated metadata harvesting from different databases and research data platforms. Key harvesting targets include data repositories, such as Zenodo, Etsin and Dryad, as well as other storage solutions used by the University community. Thanks to automation, the catalogue is able to collect a large number of research datasets without manual effort.

Another way to add metadata to the catalogue is a manual input form, which enables researchers to independently add and edit the metadata of the research data they produce. This is particularly important in the absence of automated harvesting, for example, when data are stored in a service that does not provide information on authors or their home organisations.

User testing and involvement of the research community to ensure ease of use

The usability of the data catalogue was a central part of the development process. The service had to be intuitive and easy to use, regardless of whether the user is a student, researcher or another member of the academic community. User testing assessed how researchers and other users perceive the data catalogue’s operating principles, and how its search features function from the perspective of different disciplines.

As part of the development process, vice-deans overseeing research in their respective fields were interviewed from six faculties. The discussions demonstrated that the practices of describing research data and making them available vary considerably between disciplines. For example, data in the humanities can be composed of archival sources and interview materials, while in the natural sciences datasets are often associated with extensive measurement data or computational models.

The engagement of researchers continued through user testing where the data catalogue was trialled both independently and in controlled test situations. A total of 29 researchers participated in either observational testing or questionnaire-based assessment. During the testing, it was found that the browsing feature did not function as expected, and a decision was made to remove it almost entirely. Instead, the focus was turned towards a better search feature to make it easy for users to find the data they are looking for.

Compatible architecture and information model

The data model for the data catalogue was designed to be compatible with national and international research data standards. Particular consideration was given to the DCAT standard used by the Metax metadata repository maintained by CSC as well as the DataCite Metadata Schema. This makes it possible to link the data contained in the data catalogue to broader European and international data repositories, such as the OpenAIRE database.

The architecture of the data catalogue was built around harvestable data pools and repositories as well as integrations between them. For example, DataCite integration enables the utilisation of persistent identifiers (DOIs) assigned to research datasets. In addition, the data catalogue data model has been designed to support potential extensions, such as the development of access management for sensitive data and the definition of data access rights in the future.

FAIR principles to guide further development

The data catalogue promotes the FAIR Principles (Findable, Accessible, Interoperable and Reusable) by making research data findable and citable. It supports the principles of open science by providing researchers with a tool for managing their data and making it findable. In particular, persistent identifiers and a standardised metadata model make citing and further use easier.

Researchers find the data catalogue useful in many ways:

  • It makes it easier to cite and find research data.
  • It also makes visible data that cannot be openly shared.
  • It supports teaching and the selection of thesis topics by providing an overview of data produced at the University.
  • It creates new collaboration opportunities between research groups and the academic community.

Development efforts will continue for the data catalogue, the Databank and other research data management services. The aim is to improve the user experience and FAIR compliance of services, as well as to meet new researcher needs. Next development targets include a storage service for sensitive data and solutions for making data available.

At the same time, a redesign of the website for research data management is being planned, with the aim of facilitating researchers’ access to the right services. One idea is to create a guiding form or a wizard feature that will help researchers find the most appropriate data service according to their needs.

University provides solutions

The data catalogue is a central component of research data management at the University of Helsinki. It brings together the metadata of research datasets, makes them findable and upports the further use of such data. Its development was researcher-oriented, and the service will continue to evolve to meet the needs of the University community.

It is the University’s duty to provide solutions and promote the utilisation of scholarly knowledge. The data catalogue is a step towards increasingly efficient research data management and open science.

Mari Elisa Kuusniemi 
Matilda Mela 
Mikko Mäkelä 
Timo Lahtinen 
Niina Nurm
Information Specialists in the library’s data team