Research Data Stewardship Resources

RDSI Informational Webinar

Why research data stewardship is important

There are many reasons why sharing research data is important, and can provide not only benefit to research (and researchers) but the public as well.

Benefits of Research Data Stewardship and Sharing
Further Reading

Funding Agency Research Data Policies

U.S. funding agency requirements for public access of research data have been evolving rapidly over the past several years. Below are links to a number of individual policies; however, given the fast changing landscape around agency policies, we strongly recommend you confirm your funder’s policy before submitting any proposal. For other agencies not in this list, see  a curated list of data sharing requirements by federal agency provided by the Scholarly Publishing and Academic Resources Coalition (SPARC).

Selecting a Data Repository

A research data repository is a type of database infrastructure that hosts, manages, and disseminates research data for the purposes of making it publicly available for people to discover, analyze and reuse.

Types of Data Repositories

There are three types of research data repositories: 

  • Generalist Data Repositories – Generalist data repositories collect a wide variety of data types and do not limit deposits based on the institutional affiliation of the depositor. 
  • Institutional Data Repositories – Institutional Data Repositories accept a wide variety of data, but place limits on who can deposit data based on institutional affiliation. 
    • Example – Deep Blue Data, the Institutional Data Repository for the University of Michigan 
Questions to Ask in Considering a Data Repository

Is there a Domain Repository that serves your research field?

Generally, the best place to deposit your research data is with a repository that is connected to your particular research field or discipline and/or specializes in hosting, disseminating and preserving the type of data that you have generated. Domain repositories work closely with the research communities who make use of the data they host to ensure that it is suitable and accessible for their specific needs. Depositing your data into a domain repository helps connect it with the community that is likely to benefit and connect with your work.  

Not every research community has developed a domain repository of their own yet. In cases where a domain repository does not exist or it is not suitable for your data, you may want to consider a generalist or institutional data repository.        

Does your funding agency or institution have specific requirements for your data that you need to meet?

As you are developing your grant application be sure to pay special attention to the details of the data sharing requirements of the funding agency and any additional stipulations made in the specific program that you are applying to. These requirements should be taken into consideration in selecting a data repository to host your data and in how you develop and document your data set for later reuse by others.

Some funding agencies provide guidance on selecting an appropriate repository for your data (for example, the NIH has a web page on Selecting a Data Repository). Your funding agency’s program officers may also be able to provide some guidance on selecting an appropriate repository.       

Are there any costs associated with depositing your data with a repository?

Data repositories, like other infrastructure that supports research, incur management and operational costs from providing their services. Some repositories charge fees to deposit or host data in order to cover their expenses. Be sure that you understand and budget for any fees associated with depositing data into a repository before identifying them as your repository of choice in your grant application. Many funding agencies permit the inclusion of expenses related to managing or disseminating the data from the research, but these expenses must be identified and accounted for in order to be addressed effectively in the project’s budget.       

Are there any limitations on how much or what kind of data a repository can accept, host, disseminate or preserve?

Different repositories may have different limits or constraints on what kind of and how much data they can accept. Though it may be a bit challenging to do, it’s a good idea to estimate the size and complexity of the data that you will need to share as you plan your project in order to determine if the repository you are considering can reasonably host, disseminate and preserve it.   

Does the repository take measures to preserve the data it hosts to ensure long term access?

Funding agencies generally expect that your data will be accessible well after the grant has ended. Every repository has its own level of commitment and dedicated resources to the preservation of the data that it stewards. When considering a data repository, be sure to read through its preservation policy and make note of the associated preservation services that it offers to be sure that your needs are addressed.      

Does the repository have a stable source of long term funding? What happens to your data if the repository shuts down?

Data sharing requirements are still relatively new in many disciplines and so the repositories and other infrastructure being developed to support researchers may not have a dedicated or reliable source of funding in place. In addition to exploring the capabilities of the repository, take some time to investigate its business model and its financial stability. It’s also a good idea to get an understanding of the repository’s disaster management plan and what happens to the data it hosts if it ceases operations.    

Does the repository support the use of unique identifiers? Can people cite your data set if they use it in their research?

Making your data publicly available to others should directly benefit you and increase your professional reputation. A repository that assigns Digital Object Identifiers (DOIs), or other unique identifiers to the data sets that it hosts, enables others to cite the data set when it is used, just as a journal article or book would be.       

Does your data include human subjects or sensitive information that would require limitations on access?

Not all data can be shared publicly. Data that include human subjects or other sensitive information may be subject to additional considerations when deciding on a repository. It’s important to work with your IRB to determine to what extent a data set can be shared outside of the study team and what measures will need to be taken to ensure that sharing the data set does not place subjects at risk. Different repositories have different capabilities in handling sensitive data. Some are not able to accept sensitive data that has not been sufficiently deidentified. Many other repositories do not have the safeguards needed to accept sensitive data. If you are generating data that is considered sensitive in nature, be sure to investigate the repository’s capabilities before selecting it.     

Definitions

There are a number of specialized terms that apply to the research data lifecycle. Below are some of the most important terms with common definitions.

Data

Even the term “data” itself carries a wide range of meanings depending on discipline and research method. According to NIH.gov, research data includes the recorded factual material commonly accepted in research communities as necessary to validate and replicate findings, regardless of whether the data are used to support scholarly publications. This definition would exclude preliminary analyses, completed case report forms, drafts of publications, plans for future research, peer reviews, or communications with colleagues. However, in some disciplines, research data can more broadly include physical objects and other information, such as specimens, archival materials, collections, and notebooks, that help support the provenance of the data.

Data Management Plan (DMP)

A document describing the actions to be taken over the course of the life cycle of a research dataset to ensure that it is well managed, and will eventually be findable, accessible, interoperable and reusable by others. These are required in proposals to federal funding agencies.

Digital Persistent Identifier (DPI)

These enable consistent citation and reuse of scholarly works, datasets and funding sources, most commonly using the digital object identifier (DOI) system. Repositories assign DOIs when datasets are deposited. Datasets are citable in scholarly works using DOIs similar to any other reference. Definition adapted from DataCite.org.

FAIR Data

An increasingly common term describing efforts to make research data more Findable, Accessible, Interoperable, and Reusable (FAIR). Definition adapted from Go-Fair.org.

Metadata

Information about a research data set that is structured (often in machine-readable format) for purposes of search and retrieval. Metadata elements may include basic information (e.g. title, author, date created, etc.) and/or specific elements inherent to datasets (e.g., spatial coverage, time periods, provenance). Definition adapted from DataCurationNetwork.org.

Open Access

A general term typically used to describe free access to datasets or publications with no restrictions on accessibility, (re)use, and redistribution (distinct from public access, below). Definition adapted from UNESCO.org.

Preservation

The series of managed activities necessary to ensure continued access and readability to research data sets for as long as necessary with adequate security and risk mitigation.  Definition adapted from DPCOnline.org.

Public Access

Data are readily discoverable and accessible to other researchers and people outside of the research project in which the data were generated. Public access is distinct from open access, defined above, in that something may be publicly available but may also have some restrictions on accessibility, reuse, or redistribution. Information on federal agencies’ public access policies can be found in the proposal development section of the guidance and policies page.

Repository

A specialized database that preserves, stewards, and provides access to many types of digital datasets in a variety of formats. Data repositories may focus on a specific field (such as ICPSR for Social Sciences), an institution (such as Deep Blue Data for U-M), or serve a general audience (such as Dryad). Definition adapted from CASRAI.org.  

Relevant U-M Policies/Guidelines

Several current university policies address aspects of research data use and management, including a Standard Practice Guideline published in June, 2023.

Communities of Practice

Several U-M groups are available where like-minded individuals can share information, experiences and ideas related to research data management and sharing.

U-M Communities of Practice
  • The Coderspaces community provides a forum and office hours to assist faculty, staff, and students with research methodology, statistics, data science applications, and computational programming for research.
  • The Data Analysis Networking Group (DANG!) is a forum for U-M post-docs, grad students, and other researchers to discuss how to analyze, present, and visualize their data.
  • The U-M Software / Data Carpentries is building a community of excellence at U-M around the area of reproducible data analysis.
  • MIDAS Reproducibility Hub promotes reproducible data science through raising awareness, celebrating best practices, enabling the scholarly investigation of reproducible research, and developing tools that can be widely adopted.
  • The Research Data Stewardship Initiative (RDSI) offers the community of U-M data support providers with a forum to share challenges and learn strategies related to the changing research data management and sharing requirements.