Research Data Stewardship Resources

RDSI Informational Webinar

Why research data stewardship is important

There are many reasons why sharing research data is important, and can provide not only benefit to research (and researchers) but the public as well.

Benefits of Research Data Stewardship and Sharing
Further Reading

Relevant U-M Policies/Guidelines

Several current university policies address aspects of research data use and management, including a Standard Practice Guideline published in June, 2023.

Funding Agency Research Data Policies & Resources

U.S. funding agency requirements for public access of research data have been evolving rapidly over the past several years. Below are links to a number of individual policies; however, given the fast changing landscape around agency policies, we strongly recommend you confirm your funder’s policy before submitting any proposal. For other agencies not in this list, see a curated list of data sharing requirements by federal agency provided by the Scholarly Publishing and Academic Resources Coalition (SPARC).

U-M Resources for NIH Data Management & Sharing Policy

Selecting a Data Repository

A research data repository is a type of database infrastructure that hosts, manages, and disseminates research data for the purposes of making it publicly available for people to discover, analyze and reuse.

Types of Data Repositories

There are three types of research data repositories: 

  • Generalist Data Repositories – Generalist data repositories collect a wide variety of data types and do not limit deposits based on the institutional affiliation of the depositor. 
  • Institutional Data Repositories – Institutional Data Repositories accept a wide variety of data, but place limits on who can deposit data based on institutional affiliation. 
    • Example – Deep Blue Data, the Institutional Data Repository for the University of Michigan 
Questions to Ask in Considering a Data Repository

Is there a Domain Repository that serves your research field?

Generally, the best place to deposit your research data is with a repository that is connected to your particular research field or discipline and/or specializes in hosting, disseminating and preserving the type of data that you have generated. Domain repositories work closely with the research communities who make use of the data they host to ensure that it is suitable and accessible for their specific needs. Depositing your data into a domain repository helps connect it with the community that is likely to benefit and connect with your work.  

Not every research community has developed a domain repository of their own yet. In cases where a domain repository does not exist or it is not suitable for your data, you may want to consider a generalist or institutional data repository.        

Does your funding agency or institution have specific requirements for your data that you need to meet?

As you are developing your grant application be sure to pay special attention to the details of the data sharing requirements of the funding agency and any additional stipulations made in the specific program that you are applying to. These requirements should be taken into consideration in selecting a data repository to host your data and in how you develop and document your data set for later reuse by others.

Some funding agencies provide guidance on selecting an appropriate repository for your data (for example, the NIH has a web page on Selecting a Data Repository). Your funding agency’s program officers may also be able to provide some guidance on selecting an appropriate repository.       

Are there any costs associated with depositing your data with a repository?

Data repositories, like other infrastructure that supports research, incur management and operational costs from providing their services. Some repositories charge fees to deposit or host data in order to cover their expenses. Be sure that you understand and budget for any fees associated with depositing data into a repository before identifying them as your repository of choice in your grant application. Many funding agencies permit the inclusion of expenses related to managing or disseminating the data from the research, but these expenses must be identified and accounted for in order to be addressed effectively in the project’s budget.       

Are there any limitations on how much or what kind of data a repository can accept, host, disseminate or preserve?

Different repositories may have different limits or constraints on what kind of and how much data they can accept. Though it may be a bit challenging to do, it’s a good idea to estimate the size and complexity of the data that you will need to share as you plan your project in order to determine if the repository you are considering can reasonably host, disseminate and preserve it.   

Does the repository take measures to preserve the data it hosts to ensure long term access?

Funding agencies generally expect that your data will be accessible well after the grant has ended. Every repository has its own level of commitment and dedicated resources to the preservation of the data that it stewards. When considering a data repository, be sure to read through its preservation policy and make note of the associated preservation services that it offers to be sure that your needs are addressed.      

Does the repository have a stable source of long term funding? What happens to your data if the repository shuts down?

Data sharing requirements are still relatively new in many disciplines and so the repositories and other infrastructure being developed to support researchers may not have a dedicated or reliable source of funding in place. In addition to exploring the capabilities of the repository, take some time to investigate its business model and its financial stability. It’s also a good idea to get an understanding of the repository’s disaster management plan and what happens to the data it hosts if it ceases operations.    

Does the repository support the use of unique identifiers? Can people cite your data set if they use it in their research?

Making your data publicly available to others should directly benefit you and increase your professional reputation. A repository that assigns Digital Object Identifiers (DOIs), or other unique identifiers to the data sets that it hosts, enables others to cite the data set when it is used, just as a journal article or book would be.       

Does your data include human subjects or sensitive information that would require limitations on access?

Not all data can be shared publicly. Data that include human subjects or other sensitive information may be subject to additional considerations when deciding on a repository. It’s important to work with your IRB to determine to what extent a data set can be shared outside of the study team and what measures will need to be taken to ensure that sharing the data set does not place subjects at risk. Different repositories have different capabilities in handling sensitive data. Some are not able to accept sensitive data that has not been sufficiently deidentified. Many other repositories do not have the safeguards needed to accept sensitive data. If you are generating data that is considered sensitive in nature, be sure to investigate the repository’s capabilities before selecting it.     

Communities of Practice

Several U-M groups are available where like-minded individuals can share information, experiences and ideas related to research data management and sharing.

U-M Communities of Practice
  • The Coderspaces community provides a forum and office hours to assist faculty, staff, and students with research methodology, statistics, data science applications, and computational programming for research.
  • The Data Analysis Networking Group (DANG!) is a forum for U-M post-docs, grad students, and other researchers to discuss how to analyze, present, and visualize their data.
  • The U-M Software / Data Carpentries is building a community of excellence at U-M around the area of reproducible data analysis.
  • MIDAS Reproducibility Hub promotes reproducible data science through raising awareness, celebrating best practices, enabling the scholarly investigation of reproducible research, and developing tools that can be widely adopted.
  • The Research Data Stewardship Initiative (RDSI) offers the community of U-M data support providers with a forum to share challenges and learn strategies related to the changing research data management and sharing requirements.

Definitions

There are a number of specialized terms that apply to the research data lifecycle. Below are some of the most important terms with common definitions.

Data

Even the term “data” itself carries a wide range of meanings depending on discipline and research method. According to NIH.gov, research data includes the recorded factual material commonly accepted in research communities as necessary to validate and replicate findings, regardless of whether the data are used to support scholarly publications. This definition would exclude preliminary analyses, completed case report forms, drafts of publications, plans for future research, peer reviews, or communications with colleagues. However, in some disciplines, research data can more broadly include physical objects and other information, such as specimens, archival materials, collections, and notebooks, that help support the provenance of the data.

Data Management Plan (DMP)

A document describing the actions to be taken over the course of the life cycle of a research dataset to ensure that it is well managed, and will eventually be findable, accessible, interoperable and reusable by others. These are required in proposals to federal funding agencies.

Digital Persistent Identifier (DPI)

These enable consistent citation and reuse of scholarly works, datasets and funding sources, most commonly using the digital object identifier (DOI) system. Repositories assign DOIs when datasets are deposited. Datasets are citable in scholarly works using DOIs similar to any other reference. Definition adapted from DataCite.org.

FAIR Data

An increasingly common term describing efforts to make research data more Findable, Accessible, Interoperable, and Reusable (FAIR). Definition adapted from Go-Fair.org.

Metadata

Information about a research data set that is structured (often in machine-readable format) for purposes of search and retrieval. Metadata elements may include basic information (e.g. title, author, date created, etc.) and/or specific elements inherent to datasets (e.g., spatial coverage, time periods, provenance). Definition adapted from DataCurationNetwork.org.

Open Access

A general term typically used to describe free access to datasets or publications with no restrictions on accessibility, (re)use, and redistribution (distinct from public access, below). Definition adapted from UNESCO.org.

Preservation

The series of managed activities necessary to ensure continued access and readability to research data sets for as long as necessary with adequate security and risk mitigation.  Definition adapted from DPCOnline.org.

Public Access

Data are readily discoverable and accessible to other researchers and people outside of the research project in which the data were generated. Public access is distinct from open access, defined above, in that something may be publicly available but may also have some restrictions on accessibility, reuse, or redistribution. Information on federal agencies’ public access policies can be found in the proposal development section of the guidance and policies page.

Repository

A specialized database that preserves, stewards, and provides access to many types of digital datasets in a variety of formats. Data repositories may focus on a specific field (such as ICPSR for Social Sciences), an institution (such as Deep Blue Data for U-M), or serve a general audience (such as Dryad). Definition adapted from CASRAI.org.  

Data Management During Phases of the Research Lifecycle

Across all stages of the research life cycle and all fields of study, researchers should consider the potential long-term impacts on the eventual storage and preservation of research data. Below are some resources by stage of the research life cycle that can serve as an entry point into data stewardship practices that will help researchers save time, address funder requirements, and ultimately maximize the impact of their research.

Setting up a study with data stewardship in mind–such as clear protocols for the collection and storage of data generated–will have tremendous downstream benefits. In addition, most agencies now require data management plans (DMPs) or other information about data management and stewardship as part of the proposal submission process.

Below are a selection of resources that can help researchers get started as they think through effective data practices while developing their study.

Study or Proposal Element Resources
Common Data Elements (CDEs) CDEs are structured human and machine-readable definitions of data elements for use in research and for other purposes. NIH has a Common Data Elements Repository to help researchers identify standardized terms or concepts used across studies ranging from surveys to disease nomenclature.
Metadata
Protocols
  • Curating and sharing individual protocols ensures consistency in research data practices within individual research groups and also makes it easier to share to the wider research community.
  •  U-M has an institutional subscription to an electronic lab notebook provider that allows researchers to enjoy the benefits, efficiencies, and long-term cost savings of centralized, paperless protocols and workflows.
  •  Protocols.io is another such platform for researchers to develop and share experimental protocols.
Data Management Plans

Most agencies now require data management plans (DMPs) or other information about data management and stewardship as part of the proposal submission process.

Proposal Budgeting Along with new requirements for data management, funding agencies are increasingly allowing data sharing costs to be included as direct costs in proposal budgets. ORSP provides high-level budget and cost guidance for proposals as it relates to direct costs associated with a project. Absent any prohibition from the funding agency, and in accordance with the applicable terms and conditions of the underlying grant, costs associated with data curation, data formatting, data de-identification, preparation of metadata, and repository data deposition fees may be planned for and included in the proposal as direct costs.

 

Discipline-Specific Guidance Resources
Clinical Research

U-M researchers can get assistance with the design, conduct, and analysis of clinical trials, including data management and software development through the Statistical Analysis of Biomedical and Educational Research Group (SABER).

Additional data collection guidance and an online course on the fundamentals of data management related to clinical research is available from the Michigan Institute for Clinical & Health Research (MICHR).

Qualitative Research Qualitative research is non-numerical data and often requires contextual information that pose additional data management challenges. The Data Curation Network provides a primer on data types in qualitative research to help researchers navigate data needs in these fields.
Computational Research Computational research increasingly must grapple with making code and software available, in addition to research data. Several sources provide guides for researchers on how to navigate these challenges, including NIH, Software Carpentry, and the Software Sustainability Institute. However, please and if necessary, consult with U-M Innovation Partnerships for questions about licensing options, best practices, and guidance about IP when releasing code open source.
Humanities Data needs are increasing across the humanities with the rise of digitization. The Digital Humanities Curation Guide provides a compilation of resources to help digital humanities scholars with data curation challenges. 
Diversity Scholarship An open data toolkit for diversity scholars to guide best practices in collecting, managing, utilizing, sharing and curating research data for the public good is available from the U-M Library.

Researchers have a number of factors to consider when managing research data–especially when dealing with potentially sensitive information or certain types of regulated data. U-M has a number of resources available to help researchers navigate these challenges depending on which types of data is being generated.

Overall Safety/Security Guidance Resources
International Collaborations and Export Controls Some research data may have restrictions on if/how they can be shared with foreign countries, persons, or entities. U-M Export Controls can help researchers ensure compliance with all appropriate regulations and create technology control plans (TCPs), if necessary.
Research Data Security Several types of research data require specific protections based on various university and legal requirements. U-M’s Research Information Security Oversight (RISO) Program works with PIs to determine which, if any, additional controls are required.
Safe Computing To protect yourself and your research data from phishing attacks or other electronic vulnerabilities, U-M provides high-level safe computing resources, including a sensitive data guide.

 

Research Related to Human Subjects Resources
General Guidance To help researchers maintain human subject data securely with the appropriate level of anonymity, confidentiality, or de-identification, refer to human subject data security guidance (including a checklist).   
Compliance Reviews Researchers can receive objective analysis and evaluation of research compliance, including data security and confidentiality for human subjects studies, from the Office of Research Compliance Review
Data Transfer Agreements (Michigan Medicine) When working with protected health data from Michigan Medicine, data transfer agreements associated with individual-level patient/participant data or biospecimens are reviewed by the Medical School Data Release Committee.
Diversity, Equity and Inclusion An introduction to the intersections between DEI and research data use is available in a 2020 report on Principles for Advancing Equitable Data Practice.
Student Data For educational research sponsored by the U.S. Department of Education, U-M Research Ethics and Compliance provide additional guidance around the Family Educational Rights and Privacy Act (FERPA) and other regulations.

Short-Term Research Data Management and Storage

Data Sharing and Long-Term Preservation

Short-Term Research Data Management and Storage

Responsibly and strategically managing research data streams during a study can go a long way towards improving the long-term impact and replicability of your research. There are many resources available to U-M researchers to assist in various aspects of research data management and/or analysis across a number of disciplines or approaches. Some university-wide examples are provided below.

 

Need Resources
Consulting Services
Data Storage Services (General)
Data Storage Services (Large Needs) For large amounts of data and/or large files, ITS Advanced Research Computing (ARC) offers a number of active research data storage services (e.g., OSiRIS, Locker, and Turbo). 
High Performance computing For researchers requiring high performance computing, ARC provides a number of computational and data storage resources, including the U-M Research Computing Package. Many schools and colleges also offer services in partnership with ITS including the College of Engineering, Medical School, and LSA.
Research Cores For data management needs specific to your discipline, there are a number of other services available. Many of the ~100 research cores across U-M, for example, offer data services related to the equipment and/or analyses they provide. 
Using Generative AI The Michigan Institute for Data Science developed a guide on how Generative AI can be used in multiple aspects of your research, based on published guidelines by journals, funding agencies and professional societies, as well as other benefits and risks.

 

Data Sharing and Long-Term Preservation

Research data needs change as researchers transition from actively managing a project and/or analyzing data to completing it and/or publishing it. Best practices include archiving or preservation to ensure public access, documentation of metadata to improve discoverability, and increasingly, annotation and deposition of code to ensure reproducibility. The following are examples of resources available to help researchers ensure their data is accessible over the long term.

 

Need Resources
General Guidance General guidance for sharing and preserving data, including how to select a repository, are available as a research guide from the U-M Library. Subject specific guides are also available for health sciences, engineering, and qualitative sciences.  
Long-Term Data Storage For larger data sets, Advanced Research Computing (ARC)’s Data Den Research Archive can be combined with other services (e.g., Globus, for which U-M has an institutional subscription) to enable long-term archiving of data that isn’t actively being accessed. 
Repositories (Digital Research Data)
Repositories (Physical Specimens)
  • In many disciplines, preserving research data can also include the permanent archiving of physical specimens. U-M has a number of world-class facilities and museums that assist researchers with access or depositing specimens into collections.
  • The Research Museums Center has staff that can assist with preservation of samples across anthropological archaeology, botany (and associated disciplines, paleontology, and zoology.)
  • The U-M Central Biorepository provides storage options for biospecimens and data associated with or derived from them.
Software and Code Sharing  To make computational code and/or software used to generate or analyze research data publicly available, code should be placed in a known and publicly understood repository such as GitHub, SourceForge, BitBucket or similar. These repositories should be actively maintained with updates, basic use instructions, appropriate licensing terms and an associated copyright notice. U-M Innovation Partnerships should be consulted with respect to best practices, options, approaches and guidance when releasing code open source.

 

Type of Protection Resources
Copyrights For information about copyright basics and Creative Commons licenses, researchers can review copyright guides or contact the U-M Library’s Copyright Services team directly.
Data Use Agreements (DUAs)
Publication Repositories
  • Researchers can deposit publications (before or after final publication) into publicly available repositories to satisfy funder public access requirements for publications, or to simply make scholarly work more widely accessible.
  • U-M provides an institutional repository called Deep Blue Documents to deposit articles, chapters, dissertations, conference presentations, media, and other work produced by the U-M community. 
  • Researchers may also choose one of many disciplinary repositories available at the Open Access Directory
Intellectual Property Intellectual property, technology licensing, and material transfer agreements, and often data use agreements with corporate sponsors, are handled by U-M’s Innovation Partnerships.
Publisher Data Policies

Publisher requirements for public access of research data have been evolving rapidly over the past several years. For example, many journals have chosen to adopt some or all of the Transparency and Openness Promotion (TOP) Guidelines, which require modular data citation and availability standards. 

Below are links to some large publisher policies and other resources; however, given the fast changing landscape, we strongly recommend you confirm your individual journal’s policy before submitting your publication–even if you have published in that journal recently.

SpringerNature; Wiley; PLOS; Elsevier; Taylor & Francis; SAGE; ICMJE

Open Access Publishing

Many authors choose to publish journal articles or books that are available to any reader at no cost (i.e. “open access” publications). In these cases, the publishing costs are often the responsibility of the authors themselves. 

The U-M Library has negotiated deals with many scholarly publishers to provide discounts for authors on article processing charges and also offers up to $15,000 for open access monographs in the humanities.