Research Data Stewardship Resources

Funding Agency Research Data Policies & Resources

Data Management During Phases of the Research Lifecycle

RDSI Informational Webinar

Why research data stewardship is important

There are many reasons why sharing research data is important, and can provide not only benefit to research (and researchers) but the public as well.

Benefits of Research Data Stewardship and Sharing

Making data publicly available can improve reproducibility and replicability (National Academies of Sciences, Engineering, and Medicine)
Data that are Findable, Accessible, Interoperable, and Reusable (FAIR) helps producers and users of research data (GO FAIR Initiative)
Responsible stewardship of research data can help address many DEI issues that arise during research projects (Principles for Advancing Equitable Data Practice)
Accessible data inspires more public trust in research (Pew Research Center)
Papers that link to accessible research data get cited more (PLOS One)

Relevant U-M Policies/Guidelines

Several current university policies address aspects of research data use and management, including a Standard Practice Guideline published in June, 2023.

Relevant U-M Policies

Research Data Stewardship Policy (303.06)
Institutional Data Resource Management Policy (601.12)
Information Security (601.27)
Technology Transfer Policy (303.04)
Who Holds Copyright at or in Affiliation with the University of Michigan (601.28)
Policy Statement on the Integrity of Scholarship (303.03)

Funding Agency Research Data Policies & Resources

U.S. funding agency requirements for public access of research data have been evolving rapidly over the past several years. Below are links to a number of individual policies; however, given the fast changing landscape around agency policies, we strongly recommend you confirm your funder’s policy before submitting any proposal. For other agencies not in this list, see a curated list of data sharing requirements by federal agency provided by the Scholarly Publishing and Academic Resources Coalition (SPARC).

U-M Resources for NIH Data Management & Sharing Policy

Federal Funding Agency Research Data Policies and Related Information

Agency	Policy	Effective date	Further information
NIH	Changes to Data Management and Sharing (DMS) Plan Progress Reporting and the Submission of Revised DMS Plans Are Coming on October 1	Oct 1, 2024
NIH	Policy for Data Management and Sharing	Jan 25, 2023	Policy implementation guidance from NIH NIH Sharing Policies
NSF	Data Sharing Policy	June 1, 2020	Dissemination and Sharing of Research Results from NSF
DOD	Plan to Establish Public Access to the Results of Federally Funded Research	Feb 2015
Dept of Energy	Policy for Digital Research Data Management	Oct 1, 2015	DOE Public Access Plan
Dept of Education	IES Policy Regarding Public Access to Research	October 21, 2016	Implementation Guide for Public Access to Research Data
NEH	Data Management Plans for NEH Office of Digital Humanities Proposals and Awards	June 2018
NASA	Scientific Information policy for the Science Mission Directorate [DRAFT]	Nov 2021	Open-Source Science Initiative

Selecting a Data Repository

A research data repository is a type of database infrastructure that hosts, manages, and disseminates research data for the purposes of making it publicly available for people to discover, analyze and reuse.

Types of Data Repositories

There are three types of research data repositories:

Domain / Subject Data Repositories – Domain repositories focus on hosting and disseminating specific types of data or supporting particular fields of research.

Examples – ICPSR for Social Science data, Gene Expression Omnibus (GEO) for Genomics data, or the Virtual Astronomical Observatory (VAO) for Astronomy data.

Generalist Data Repositories – Generalist data repositories collect a wide variety of data types and do not limit deposits based on the institutional affiliation of the depositor.

Examples – Dryad, FigShare, Open Science Framework

Institutional Data Repositories – Institutional Data Repositories accept a wide variety of data, but place limits on who can deposit data based on institutional affiliation.

Example – Deep Blue Data, the Institutional Data Repository for the University of Michigan

Questions to Ask in Considering a Data Repository

Is there a Domain Repository that serves your research field?

Generally, the best place to deposit your research data is with a repository that is connected to your particular research field or discipline and/or specializes in hosting, disseminating and preserving the type of data that you have generated. Domain repositories work closely with the research communities who make use of the data they host to ensure that it is suitable and accessible for their specific needs. Depositing your data into a domain repository helps connect it with the community that is likely to benefit and connect with your work.

Not every research community has developed a domain repository of their own yet. In cases where a domain repository does not exist or it is not suitable for your data, you may want to consider a generalist or institutional data repository.

Does your funding agency or institution have specific requirements for your data that you need to meet?

As you are developing your grant application be sure to pay special attention to the details of the data sharing requirements of the funding agency and any additional stipulations made in the specific program that you are applying to. These requirements should be taken into consideration in selecting a data repository to host your data and in how you develop and document your data set for later reuse by others.

Some funding agencies provide guidance on selecting an appropriate repository for your data (for example, the NIH has a web page on Selecting a Data Repository). Your funding agency’s program officers may also be able to provide some guidance on selecting an appropriate repository.

Are there any costs associated with depositing your data with a repository?

Data repositories, like other infrastructure that supports research, incur management and operational costs from providing their services. Some repositories charge fees to deposit or host data in order to cover their expenses. Be sure that you understand and budget for any fees associated with depositing data into a repository before identifying them as your repository of choice in your grant application. Many funding agencies permit the inclusion of expenses related to managing or disseminating the data from the research, but these expenses must be identified and accounted for in order to be addressed effectively in the project’s budget.

Are there any limitations on how much or what kind of data a repository can accept, host, disseminate or preserve?

Different repositories may have different limits or constraints on what kind of and how much data they can accept. Though it may be a bit challenging to do, it’s a good idea to estimate the size and complexity of the data that you will need to share as you plan your project in order to determine if the repository you are considering can reasonably host, disseminate and preserve it.

Does the repository take measures to preserve the data it hosts to ensure long term access?

Funding agencies generally expect that your data will be accessible well after the grant has ended. Every repository has its own level of commitment and dedicated resources to the preservation of the data that it stewards. When considering a data repository, be sure to read through its preservation policy and make note of the associated preservation services that it offers to be sure that your needs are addressed.

Does the repository have a stable source of long term funding? What happens to your data if the repository shuts down?

Data sharing requirements are still relatively new in many disciplines and so the repositories and other infrastructure being developed to support researchers may not have a dedicated or reliable source of funding in place. In addition to exploring the capabilities of the repository, take some time to investigate its business model and its financial stability. It’s also a good idea to get an understanding of the repository’s disaster management plan and what happens to the data it hosts if it ceases operations.

Does the repository support the use of unique identifiers? Can people cite your data set if they use it in their research?

Making your data publicly available to others should directly benefit you and increase your professional reputation. A repository that assigns Digital Object Identifiers (DOIs), or other unique identifiers to the data sets that it hosts, enables others to cite the data set when it is used, just as a journal article or book would be.

Does your data include human subjects or sensitive information that would require limitations on access?

Not all data can be shared publicly. Data that include human subjects or other sensitive information may be subject to additional considerations when deciding on a repository. It’s important to work with your IRB to determine to what extent a data set can be shared outside of the study team and what measures will need to be taken to ensure that sharing the data set does not place subjects at risk. Different repositories have different capabilities in handling sensitive data. Some are not able to accept sensitive data that has not been sufficiently deidentified. Many other repositories do not have the safeguards needed to accept sensitive data. If you are generating data that is considered sensitive in nature, be sure to investigate the repository’s capabilities before selecting it.

Communities of Practice

Several U-M groups are available where like-minded individuals can share information, experiences and ideas related to research data management and sharing.

U-M Communities of Practice

The Coderspaces community provides a forum and office hours to assist faculty, staff, and students with research methodology, statistics, data science applications, and computational programming for research.
The Data Analysis Networking Group (DANG!) is a forum for U-M post-docs, grad students, and other researchers to discuss how to analyze, present, and visualize their data.
The U-M Software / Data Carpentries is building a community of excellence at U-M around the area of reproducible data analysis.
MIDAS Reproducibility Hub promotes reproducible data science through raising awareness, celebrating best practices, enabling the scholarly investigation of reproducible research, and developing tools that can be widely adopted.
The Research Data Stewardship Initiative (RDSI) offers the community of U-M data support providers with a forum to share challenges and learn strategies related to the changing research data management and sharing requirements.

Definitions

There are a number of specialized terms that apply to the research data lifecycle. Below are some of the most important terms with common definitions.

Data

Even the term “data” itself carries a wide range of meanings depending on discipline and research method. According to NIH.gov, research data includes the recorded factual material commonly accepted in research communities as necessary to validate and replicate findings, regardless of whether the data are used to support scholarly publications. This definition would exclude preliminary analyses, completed case report forms, drafts of publications, plans for future research, peer reviews, or communications with colleagues. However, in some disciplines, research data can more broadly include physical objects and other information, such as specimens, archival materials, collections, and notebooks, that help support the provenance of the data.

Data Management Plan (DMP)

A document describing the actions to be taken over the course of the life cycle of a research dataset to ensure that it is well managed, and will eventually be findable, accessible, interoperable and reusable by others. These are required in proposals to federal funding agencies.

Digital Persistent Identifier (DPI)

These enable consistent citation and reuse of scholarly works, datasets and funding sources, most commonly using the digital object identifier (DOI) system. Repositories assign DOIs when datasets are deposited. Datasets are citable in scholarly works using DOIs similar to any other reference. Definition adapted from DataCite.org.

FAIR Data

An increasingly common term describing efforts to make research data more Findable, Accessible, Interoperable, and Reusable (FAIR). Definition adapted from Go-Fair.org.

Metadata

Information about a research data set that is structured (often in machine-readable format) for purposes of search and retrieval. Metadata elements may include basic information (e.g. title, author, date created, etc.) and/or specific elements inherent to datasets (e.g., spatial coverage, time periods, provenance). Definition adapted from DataCurationNetwork.org.

Open Access

A general term typically used to describe free access to datasets or publications with no restrictions on accessibility, (re)use, and redistribution (distinct from public access, below). Definition adapted from UNESCO.org.

Preservation

The series of managed activities necessary to ensure continued access and readability to research data sets for as long as necessary with adequate security and risk mitigation. Definition adapted from DPCOnline.org.

Public Access

Data are readily discoverable and accessible to other researchers and people outside of the research project in which the data were generated. Public access is distinct from open access, defined above, in that something may be publicly available but may also have some restrictions on accessibility, reuse, or redistribution. Information on federal agencies’ public access policies can be found in the proposal development section of the guidance and policies page.

Repository

A specialized database that preserves, stewards, and provides access to many types of digital datasets in a variety of formats. Data repositories may focus on a specific field (such as ICPSR for Social Sciences), an institution (such as Deep Blue Data for U-M), or serve a general audience (such as Dryad). Definition adapted from CASRAI.org.

Data Management During Phases of the Research Lifecycle

Across all stages of the research life cycle and all fields of study, researchers should consider the potential long-term impacts on the eventual storage and preservation of research data. Below are some resources by stage of the research life cycle that can serve as an entry point into data stewardship practices that will help researchers save time, address funder requirements, and ultimately maximize the impact of their research.

Setting up a study with data stewardship in mind–such as clear protocols for the collection and storage of data generated–will have tremendous downstream benefits. In addition, most agencies now require data management plans (DMPs) or other information about data management and stewardship as part of the proposal submission process.

Below are a selection of resources that can help researchers get started as they think through effective data practices while developing their study.

Study or Proposal Element	Resources
Common Data Elements (CDEs)	CDEs are structured human and machine-readable definitions of data elements for use in research and for other purposes. NIH has a Common Data Elements Repository to help researchers identify standardized terms or concepts used across studies ranging from surveys to disease nomenclature.
Metadata	FAIRSharing.org provides a database of metadata standards across disciplines for researchers, which helps improve data sharing practices. The U-M Library also developed guidance for best practices around data documentation and metadata standards.
Protocols	Curating and sharing individual protocols ensures consistency in research data practices within individual research groups and also makes it easier to share to the wider research community. U-M has an institutional subscription to an electronic lab notebook provider that allows researchers to enjoy the benefits, efficiencies, and long-term cost savings of centralized, paperless protocols and workflows. Protocol s.io is another such platform for researchers to develop and share experimental protocols.
Data Management Plans	Most agencies now require data management plans (DMPs) or other information about data management and stewardship as part of the proposal submission process. Writing a Data Management Plan – Informational Bulletin The U-M library provides broad guidance about data management plans including subject-specific guides for engineering, social sciences, and health sciences. The DMPTool is a free, open-source tool that helps researchers create standardized data management plans. General data management plan templates are available to download and fill out to help you plan for your data: General Data Management Plan template for a Project General Data Management Plan template for a Project – Qualitative / Humanities General Data Management Plan template for a Research Team or Laboratory
Proposal Budgeting	Along with new requirements for data management, funding agencies are increasingly allowing data sharing costs to be included as direct costs in proposal budgets. ORSP provides high-level budget and cost guidance for proposals as it relates to direct costs associated with a project. Absent any prohibition from the funding agency, and in accordance with the applicable terms and conditions of the underlying grant, costs associated with data curation, data formatting, data de-identification, preparation of metadata, and repository data deposition fees may be planned for and included in the proposal as direct costs.

Discipline-Specific Guidance	Resources
Clinical Research	U-M researchers can get assistance with the design, conduct, and analysis of clinical trials, including data management and software development through the Statistical Analysis of Biomedical and Educational Research Group (SABER). Additional data collection guidance and an online course on the fundamentals of data management related to clinical research is available from the Michigan Institute for Clinical & Health Research (MICHR).
Qualitative Research	Qualitative research is non-numerical data and often requires contextual information that pose additional data management challenges. The Data Curation Network provides a primer on data types in qualitative research to help researchers navigate data needs in these fields.
Computational Research	Computational research increasingly must grapple with making code and software available, in addition to research data. Several sources provide guides for researchers on how to navigate these challenges, including NIH, Software Carpentry, and the Software Sustainability Institute. However, please and if necessary, consult with U-M Innovation Partnerships for questions about licensing options, best practices, and guidance about IP when releasing code open source.
Humanities	Data needs are increasing across the humanities with the rise of digitization. The Digital Humanities Curation Guide provides a compilation of resources to help digital humanities scholars with data curation challenges.
Diversity Scholarship	An open data toolkit for diversity scholars to guide best practices in collecting, managing, utilizing, sharing and curating research data for the public good is available from the U-M Library.

Researchers have a number of factors to consider when managing research data–especially when dealing with potentially sensitive information or certain types of regulated data. U-M has a number of resources available to help researchers navigate these challenges depending on which types of data is being generated.

Overall Safety/Security Guidance	Resources
International Collaborations and Export Controls	Some research data may have restrictions on if/how they can be shared with foreign countries, persons, or entities. U-M Export Controls can help researchers ensure compliance with all appropriate regulations and create technology control plans (TCPs), if necessary.
Research Data Security	Several types of research data require specific protections based on various university and legal requirements. U-M’s Research Information Security Oversight (RISO) Program works with PIs to determine which, if any, additional controls are required.
Safe Computing	To protect yourself and your research data from phishing attacks or other electronic vulnerabilities, U-M provides high-level safe computing resources, including a sensitive data guide.

Research Related to Human Subjects	Resources
General Guidance	To help researchers maintain human subject data securely with the appropriate level of anonymity, confidentiality, or de-identification, refer to human subject data security guidance (including a checklist).
Compliance Reviews	Researchers can receive objective analysis and evaluation of research compliance, including data security and confidentiality for human subjects studies, from the Office of Research Compliance Review.
Data Transfer Agreements (Michigan Medicine)	When working with protected health data from Michigan Medicine, data transfer agreements associated with individual-level patient/participant data or biospecimens are reviewed by the Medical School Data Release Committee.
Diversity, Equity and Inclusion	An introduction to the intersections between DEI and research data use is available in a 2020 report on Principles for Advancing Equitable Data Practice.
Student Data	For educational research sponsored by the U.S. Department of Education, U-M Research Ethics and Compliance provide additional guidance around the Family Educational Rights and Privacy Act (FERPA) and other regulations.
Informed Consent	The informed consent process is important for ensuring participant safety and is also an important factor when sharing participant data later. Learn more in this informational bulletin from RDSI: Informed Consent and Data Sharing.
De-identifying Data	De-identifying data, which is often a step toward anonymizing data, involves minimizing the risk that a participating individual in a research study could become known. De-identifying data is a useful strategy for protecting the privacy of study participants. Learn more in this informational bulletin from RDSI: De-identifying Data

Short-Term Research Data Management and Storage

Data Sharing and Long-Term Preservation

Short-Term Research Data Management and Storage

Responsibly and strategically managing research data streams during a study can go a long way towards improving the long-term impact and replicability of your research. There are many resources available to U-M researchers to assist in various aspects of research data management and/or analysis across a number of disciplines or approaches. Some university-wide examples are provided below.

Managing & Organizing Data – Informational Bulletin

Need	Resources
Consulting Services	Consulting services to help computational researchers implement data analytics and workflows, and assists with data management and storage needs are provided by Advanced Research Computing (ARC) and through Consulting for Statistics, Computing, and Analytics Research (CSCAR). For researchers working in clinical and translational science, the Data Management Mentoring Program in MICHR provides mentoring to help research teams more efficiently collect and manage their research data by sharing data management tools and best practices.
Data Storage Services (General)	For researchers needing to identify and compare short-term research data storage needs, U-M ITS offers a Data Storage Finder. An additional option for researchers includes electronic lab notebooks, which integrate an easy-to-use solution. Guidance for archiving of clinical research data is available from MICHR.
Data Storage Services (Large Needs)	For large amounts of data and/or large files, ITS Advanced Research Computing (ARC) offers a number of active research data storage services (e.g., OSiRIS, Locker, and Turbo).
High Performance computing	For researchers requiring high performance computing, ARC provides a number of computational and data storage resources, including the U-M Research Computing Package. Many schools and colleges also offer services in partnership with ITS including the College of Engineering, Medical School, and LSA.
Research Cores	For data management needs specific to your discipline, there are a number of other services available. Many of the ~100 research cores across U-M, for example, offer data services related to the equipment and/or analyses they provide.
Using Generative AI	The Michigan Institute for Data Science developed a guide on how Generative AI can be used in multiple aspects of your research, based on published guidelines by journals, funding agencies and professional societies, as well as other benefits and risks.

Data Sharing and Long-Term Preservation

Research data needs change as researchers transition from actively managing a project and/or analyzing data to completing it and/or publishing it. Best practices include archiving or preservation to ensure public access, documentation of metadata to improve discoverability, and increasingly, annotation and deposition of code to ensure reproducibility. The following are examples of resources available to help researchers ensure their data is accessible over the long term.

Need	Resources
General Guidance	General guidance for sharing and preserving data, including how to select a repository, are available as a research guide from the U-M Library. Subject specific guides are also available for health sciences, engineering, and qualitative sciences.
Long-Term Data Storage	For larger data sets, Advanced Research Computing (ARC)’s Data Den Research Archive can be combined with other services (e.g., Globus, for which U-M has an institutional subscription) to enable long-term archiving of data that isn’t actively being accessed.
Repositories (Digital Research Data)	Hundreds of data repositories are available to researchers depending on your field of study and needs. Re3data and the Open Access Directory maintain lists of repositories by country or field of research. NIH also maintains a list of NIH-supported domain-specific repositories. U-M is a member institution of Qualitative Data Repository (QDR), so researchers can preserve, share, and discover a wide range of digital data and accompanying documentation generated or collected through qualitative and mixed-method research in the social sciences. U-M hosts several research data repositories for use by U-M researchers (and in many cases the broader research community) including: Deep Blue Data, which enables sharing and preserving all types of research data developed at U-M. In the social and behavioral sciences, U-M hosts the Inter-university Consortium for Political and Social Research (ICPSR) for storing, curating, and providing access to data. For materials scientists, U-M hosts Materials Commons for storing and sharing data.
Repositories (Physical Specimens)	In many disciplines, preserving research data can also include the permanent archiving of physical specimens. U-M has a number of world-class facilities and museums that assist researchers with access or depositing specimens into collections. The Research Museums Center has staff that can assist with preservation of samples across anthropological archaeology, botany (and associated disciplines, paleontology, and zoology.) The U-M Central Biorepository provides storage options for biospecimens and data associated with or derived from them.
Software and Code Sharing	To make computational code and/or software used to generate or analyze research data publicly available, code should be placed in a known and publicly understood repository such as GitHub, SourceForge, BitBucket or similar. These repositories should be actively maintained with updates, basic use instructions, appropriate licensing terms and an associated copyright notice. U-M Innovation Partnerships should be consulted with respect to best practices, options, approaches and guidance when releasing code open source.

Sharing Data – Informational Bulletin

Type of Protection	Resources
Copyrights	For information about copyright basics and Creative Commons licenses, researchers can review copyright guides or contact the U-M Library’s Copyright Services team directly.
Data Use Agreements (DUAs)	DUAs can be required to transfer non-public data or data that is subject to restrictions on its use. The Office of Research and Sponsored Projects (ORSP) manages these as unfunded agreements (UFA) in eRPM. If patient health information is involved, the Data Office for Clinical & Translational Research (DOCTR) provides a HIPAA review, and if needed, engages the Compliance Office of Michigan Medicine. IHPI maintains a searchable database of data use agreements in the health sciences available to the U-M community.
Publication Repositories	Researchers can deposit publications (before or after final publication) into publicly available repositories to satisfy funder public access requirements for publications, or to simply make scholarly work more widely accessible. U-M provides an institutional repository called Deep Blue Documents to deposit articles, chapters, dissertations, conference presentations, media, and other work produced by the U-M community. Researchers may also choose one of many disciplinary repositories available at the Open Access Directory.
Intellectual Property	Intellectual property, technology licensing, and material transfer agreements, and often data use agreements with corporate sponsors, are handled by U-M’s Innovation Partnerships.
Publisher Data Policies	Publisher requirements for public access of research data have been evolving rapidly over the past several years. For example, many journals have chosen to adopt some or all of the Transparency and Openness Promotion (TOP) Guidelines, which require modular data citation and availability standards. Below are links to some large publisher policies and other resources; however, given the fast changing landscape, we strongly recommend you confirm your individual journal’s policy before submitting your publication–even if you have published in that journal recently. SpringerNature; Wiley; PLOS; Elsevier; Taylor & Francis; SAGE; ICMJE
Open Access Publishing	Many authors choose to publish journal articles or books that are available to any reader at no cost (i.e. “open access” publications). In these cases, the publishing costs are often the responsibility of the authors themselves. The U-M Library has negotiated deals with many scholarly publishers to provide discounts for authors on article processing charges and also offers up to $15,000 for open access monographs in the humanities.

Homepage

About

FAQs

News & Events

Have Questions?

send questions or information and resources for the site

Want Updates?

SIGN UP FOR THE RDSI EMAIL LIST

JOIN THE RDSI COMMUNITY OF PRACTICE

The RDSI Community of Practice is an open MCommunity group available to all U-M staff and faculty with questions on their work as it relates to research data management.

Research Data Stewardship Resources

RDSI Informational Webinar

Why research data stewardship is important

Benefits of Research Data Stewardship and Sharing

Further Reading

Relevant U-M Policies/Guidelines

Relevant U-M Policies

Funding Agency Research Data Policies & Resources

Federal Funding Agency Research Data Policies and Related Information

Selecting a Data Repository

Types of Data Repositories

Questions to Ask in Considering a Data Repository

Communities of Practice

U-M Communities of Practice

Definitions

Data

Data Management Plan (DMP)

Digital Persistent Identifier (DPI)

FAIR Data

Metadata

Open Access

Preservation

Public Access

Repository

Data Management During Phases of the Research Lifecycle

Have Questions?

Want Updates?