Research Data Stewardship Resources
RDSI Informational Webinar
Why research data stewardship is important
There are many reasons why sharing research data is important, and can provide not only benefit to research (and researchers) but the public as well.
Benefits of Research Data Stewardship and Sharing
- Making data publicly available can improve reproducibility and replicability (National Academies of Sciences, Engineering, and Medicine)
- Data that are Findable, Accessible, Interoperable, and Reusable (FAIR) helps producers and users of research data (GO FAIR Initiative)
- Responsible stewardship of research data can help address many DEI issues that arise during research projects (Principles for Advancing Equitable Data Practice)
- Accessible data inspires more public trust in research (Pew Research Center)
- Papers that link to accessible research data get cited more (PLOS One)
Further Reading
- Ten Reasons to Share Your Data (Nature Index)
- Ten Simple Rules for Improving Research Data Discovery (PLOS Computational Biology)
- Why Share Your Data? (U.S. Geological Survey)
- Sharing Health Data: The Why, the Will, and the Way Forward (National Academy of Medicine)
- Public Access to Research Data (Association of Public and Land Grant Universities)
Relevant U-M Policies
- Research Data Stewardship Policy (303.06)
- Institutional Data Resource Management Policy (601.12)
- Information Security (601.27)
- Technology Transfer Policy (303.04)
- Who Holds Copyright at or in Affiliation with the University of Michigan (601.28)
- Policy Statement on the Integrity of Scholarship (303.03)
Funding Agency Research Data Policies & Resources
U.S. funding agency requirements for public access of research data have been evolving rapidly over the past several years. Below are links to a number of individual policies; however, given the fast changing landscape around agency policies, we strongly recommend you confirm your funder’s policy before submitting any proposal. For other agencies not in this list, see a curated list of data sharing requirements by federal agency provided by the Scholarly Publishing and Academic Resources Coalition (SPARC).
Federal Funding Agency Research Data Policies and Related Information
Agency | Policy | Effective date | Further information |
NIH |
Changes to Data Management and Sharing (DMS) Plan Progress Reporting and the Submission of Revised DMS Plans Are Coming on October 1 | Oct 1, 2024 |
|
Policy for Data Management and Sharing | Jan 25, 2023 | ||
NSF | Data Sharing Policy | June 1, 2020 | Dissemination and Sharing of Research Results from NSF |
DOD | Plan to Establish Public Access to the Results of Federally Funded Research | Feb 2015 | |
Dept of Energy | Policy for Digital Research Data Management | Oct 1, 2015 | DOE Public Access Plan |
Dept of Education | IES Policy Regarding Public Access to Research | October 21, 2016 | Implementation Guide for Public Access to Research Data |
NEH | Data Management Plans for NEH Office of Digital Humanities Proposals and Awards | June 2018 | |
NASA | Scientific Information policy for the Science Mission Directorate [DRAFT] | Nov 2021 | Open-Source Science Initiative |
Types of Data Repositories
There are three types of research data repositories:
- Domain / Subject Data Repositories – Domain repositories focus on hosting and disseminating specific types of data or supporting particular fields of research.
- Examples – ICPSR for Social Science data, Gene Expression Omnibus (GEO) for Genomics data, or the Virtual Astronomical Observatory (VAO) for Astronomy data.
- Generalist Data Repositories – Generalist data repositories collect a wide variety of data types and do not limit deposits based on the institutional affiliation of the depositor.
- Examples – Dryad, FigShare, Open Science Framework
- Institutional Data Repositories – Institutional Data Repositories accept a wide variety of data, but place limits on who can deposit data based on institutional affiliation.
- Example – Deep Blue Data, the Institutional Data Repository for the University of Michigan
Questions to Ask in Considering a Data Repository
Is there a Domain Repository that serves your research field?
Generally, the best place to deposit your research data is with a repository that is connected to your particular research field or discipline and/or specializes in hosting, disseminating and preserving the type of data that you have generated. Domain repositories work closely with the research communities who make use of the data they host to ensure that it is suitable and accessible for their specific needs. Depositing your data into a domain repository helps connect it with the community that is likely to benefit and connect with your work.
Not every research community has developed a domain repository of their own yet. In cases where a domain repository does not exist or it is not suitable for your data, you may want to consider a generalist or institutional data repository.
Does your funding agency or institution have specific requirements for your data that you need to meet?
As you are developing your grant application be sure to pay special attention to the details of the data sharing requirements of the funding agency and any additional stipulations made in the specific program that you are applying to. These requirements should be taken into consideration in selecting a data repository to host your data and in how you develop and document your data set for later reuse by others.
Some funding agencies provide guidance on selecting an appropriate repository for your data (for example, the NIH has a web page on Selecting a Data Repository). Your funding agency’s program officers may also be able to provide some guidance on selecting an appropriate repository.
Are there any costs associated with depositing your data with a repository?
Data repositories, like other infrastructure that supports research, incur management and operational costs from providing their services. Some repositories charge fees to deposit or host data in order to cover their expenses. Be sure that you understand and budget for any fees associated with depositing data into a repository before identifying them as your repository of choice in your grant application. Many funding agencies permit the inclusion of expenses related to managing or disseminating the data from the research, but these expenses must be identified and accounted for in order to be addressed effectively in the project’s budget.
Are there any limitations on how much or what kind of data a repository can accept, host, disseminate or preserve?
Different repositories may have different limits or constraints on what kind of and how much data they can accept. Though it may be a bit challenging to do, it’s a good idea to estimate the size and complexity of the data that you will need to share as you plan your project in order to determine if the repository you are considering can reasonably host, disseminate and preserve it.
Does the repository take measures to preserve the data it hosts to ensure long term access?
Funding agencies generally expect that your data will be accessible well after the grant has ended. Every repository has its own level of commitment and dedicated resources to the preservation of the data that it stewards. When considering a data repository, be sure to read through its preservation policy and make note of the associated preservation services that it offers to be sure that your needs are addressed.
Does the repository have a stable source of long term funding? What happens to your data if the repository shuts down?
Data sharing requirements are still relatively new in many disciplines and so the repositories and other infrastructure being developed to support researchers may not have a dedicated or reliable source of funding in place. In addition to exploring the capabilities of the repository, take some time to investigate its business model and its financial stability. It’s also a good idea to get an understanding of the repository’s disaster management plan and what happens to the data it hosts if it ceases operations.
Does the repository support the use of unique identifiers? Can people cite your data set if they use it in their research?
Making your data publicly available to others should directly benefit you and increase your professional reputation. A repository that assigns Digital Object Identifiers (DOIs), or other unique identifiers to the data sets that it hosts, enables others to cite the data set when it is used, just as a journal article or book would be.
Does your data include human subjects or sensitive information that would require limitations on access?
Not all data can be shared publicly. Data that include human subjects or other sensitive information may be subject to additional considerations when deciding on a repository. It’s important to work with your IRB to determine to what extent a data set can be shared outside of the study team and what measures will need to be taken to ensure that sharing the data set does not place subjects at risk. Different repositories have different capabilities in handling sensitive data. Some are not able to accept sensitive data that has not been sufficiently deidentified. Many other repositories do not have the safeguards needed to accept sensitive data. If you are generating data that is considered sensitive in nature, be sure to investigate the repository’s capabilities before selecting it.
U-M Communities of Practice
- The Coderspaces community provides a forum and office hours to assist faculty, staff, and students with research methodology, statistics, data science applications, and computational programming for research.
- The Data Analysis Networking Group (DANG!) is a forum for U-M post-docs, grad students, and other researchers to discuss how to analyze, present, and visualize their data.
- The U-M Software / Data Carpentries is building a community of excellence at U-M around the area of reproducible data analysis.
- MIDAS Reproducibility Hub promotes reproducible data science through raising awareness, celebrating best practices, enabling the scholarly investigation of reproducible research, and developing tools that can be widely adopted.
- The Research Data Stewardship Initiative (RDSI) offers the community of U-M data support providers with a forum to share challenges and learn strategies related to the changing research data management and sharing requirements.
Data
Even the term “data” itself carries a wide range of meanings depending on discipline and research method. According to NIH.gov, research data includes the recorded factual material commonly accepted in research communities as necessary to validate and replicate findings, regardless of whether the data are used to support scholarly publications. This definition would exclude preliminary analyses, completed case report forms, drafts of publications, plans for future research, peer reviews, or communications with colleagues. However, in some disciplines, research data can more broadly include physical objects and other information, such as specimens, archival materials, collections, and notebooks, that help support the provenance of the data.
Data Management Plan (DMP)
A document describing the actions to be taken over the course of the life cycle of a research dataset to ensure that it is well managed, and will eventually be findable, accessible, interoperable and reusable by others. These are required in proposals to federal funding agencies.
Digital Persistent Identifier (DPI)
These enable consistent citation and reuse of scholarly works, datasets and funding sources, most commonly using the digital object identifier (DOI) system. Repositories assign DOIs when datasets are deposited. Datasets are citable in scholarly works using DOIs similar to any other reference. Definition adapted from DataCite.org.
FAIR Data
An increasingly common term describing efforts to make research data more Findable, Accessible, Interoperable, and Reusable (FAIR). Definition adapted from Go-Fair.org.
Metadata
Information about a research data set that is structured (often in machine-readable format) for purposes of search and retrieval. Metadata elements may include basic information (e.g. title, author, date created, etc.) and/or specific elements inherent to datasets (e.g., spatial coverage, time periods, provenance). Definition adapted from DataCurationNetwork.org.
Open Access
A general term typically used to describe free access to datasets or publications with no restrictions on accessibility, (re)use, and redistribution (distinct from public access, below). Definition adapted from UNESCO.org.
Preservation
The series of managed activities necessary to ensure continued access and readability to research data sets for as long as necessary with adequate security and risk mitigation. Definition adapted from DPCOnline.org.
Public Access
Data are readily discoverable and accessible to other researchers and people outside of the research project in which the data were generated. Public access is distinct from open access, defined above, in that something may be publicly available but may also have some restrictions on accessibility, reuse, or redistribution. Information on federal agencies’ public access policies can be found in the proposal development section of the guidance and policies page.
Repository
A specialized database that preserves, stewards, and provides access to many types of digital datasets in a variety of formats. Data repositories may focus on a specific field (such as ICPSR for Social Sciences), an institution (such as Deep Blue Data for U-M), or serve a general audience (such as Dryad). Definition adapted from CASRAI.org.
Across all stages of the research life cycle and all fields of study, researchers should consider the potential long-term impacts on the eventual storage and preservation of research data. Below are some resources by stage of the research life cycle that can serve as an entry point into data stewardship practices that will help researchers save time, address funder requirements, and ultimately maximize the impact of their research.
- Study Design and Data Collection
- Data Security & Privacy Protections
- Research Data Management and Storage
- Publishing, Licensing & Data Use Agreements
Setting up a study with data stewardship in mind–such as clear protocols for the collection and storage of data generated–will have tremendous downstream benefits. In addition, most agencies now require data management plans (DMPs) or other information about data management and stewardship as part of the proposal submission process.
Below are a selection of resources that can help researchers get started as they think through effective data practices while developing their study.
Study or Proposal Element | Resources |
Common Data Elements (CDEs) | CDEs are structured human and machine-readable definitions of data elements for use in research and for other purposes. NIH has a Common Data Elements Repository to help researchers identify standardized terms or concepts used across studies ranging from surveys to disease nomenclature. |
Metadata |
|
Protocols |
|
Data Management Plans |
Most agencies now require data management plans (DMPs) or other information about data management and stewardship as part of the proposal submission process.
|
Proposal Budgeting | Along with new requirements for data management, funding agencies are increasingly allowing data sharing costs to be included as direct costs in proposal budgets. ORSP provides high-level budget and cost guidance for proposals as it relates to direct costs associated with a project. Absent any prohibition from the funding agency, and in accordance with the applicable terms and conditions of the underlying grant, costs associated with data curation, data formatting, data de-identification, preparation of metadata, and repository data deposition fees may be planned for and included in the proposal as direct costs. |
Discipline-Specific Guidance | Resources |
Clinical Research |
U-M researchers can get assistance with the design, conduct, and analysis of clinical trials, including data management and software development through the Statistical Analysis of Biomedical and Educational Research Group (SABER). Additional data collection guidance and an online course on the fundamentals of data management related to clinical research is available from the Michigan Institute for Clinical & Health Research (MICHR). |
Qualitative Research | Qualitative research is non-numerical data and often requires contextual information that pose additional data management challenges. The Data Curation Network provides a primer on data types in qualitative research to help researchers navigate data needs in these fields. |
Computational Research | Computational research increasingly must grapple with making code and software available, in addition to research data. Several sources provide guides for researchers on how to navigate these challenges, including NIH, Software Carpentry, and the Software Sustainability Institute. However, please and if necessary, consult with U-M Innovation Partnerships for questions about licensing options, best practices, and guidance about IP when releasing code open source. |
Humanities | Data needs are increasing across the humanities with the rise of digitization. The Digital Humanities Curation Guide provides a compilation of resources to help digital humanities scholars with data curation challenges. |
Diversity Scholarship | An open data toolkit for diversity scholars to guide best practices in collecting, managing, utilizing, sharing and curating research data for the public good is available from the U-M Library. |
Researchers have a number of factors to consider when managing research data–especially when dealing with potentially sensitive information or certain types of regulated data. U-M has a number of resources available to help researchers navigate these challenges depending on which types of data is being generated.
Overall Safety/Security Guidance | Resources |
International Collaborations and Export Controls | Some research data may have restrictions on if/how they can be shared with foreign countries, persons, or entities. U-M Export Controls can help researchers ensure compliance with all appropriate regulations and create technology control plans (TCPs), if necessary. |
Research Data Security | Several types of research data require specific protections based on various university and legal requirements. U-M’s Research Information Security Oversight (RISO) Program works with PIs to determine which, if any, additional controls are required. |
Safe Computing | To protect yourself and your research data from phishing attacks or other electronic vulnerabilities, U-M provides high-level safe computing resources, including a sensitive data guide. |
Research Related to Human Subjects | Resources |
General Guidance | To help researchers maintain human subject data securely with the appropriate level of anonymity, confidentiality, or de-identification, refer to human subject data security guidance (including a checklist). |
Compliance Reviews | Researchers can receive objective analysis and evaluation of research compliance, including data security and confidentiality for human subjects studies, from the Office of Research Compliance Review. |
Data Transfer Agreements (Michigan Medicine) | When working with protected health data from Michigan Medicine, data transfer agreements associated with individual-level patient/participant data or biospecimens are reviewed by the Medical School Data Release Committee. |
Diversity, Equity and Inclusion | An introduction to the intersections between DEI and research data use is available in a 2020 report on Principles for Advancing Equitable Data Practice. |
Student Data | For educational research sponsored by the U.S. Department of Education, U-M Research Ethics and Compliance provide additional guidance around the Family Educational Rights and Privacy Act (FERPA) and other regulations. |
Short-Term Research Data Management and Storage
Data Sharing and Long-Term Preservation
Short-Term Research Data Management and Storage
Responsibly and strategically managing research data streams during a study can go a long way towards improving the long-term impact and replicability of your research. There are many resources available to U-M researchers to assist in various aspects of research data management and/or analysis across a number of disciplines or approaches. Some university-wide examples are provided below.
Managing & Organizing Data – Informational Bulletin
Need | Resources |
Consulting Services |
|
Data Storage Services (General) |
|
Data Storage Services (Large Needs) | For large amounts of data and/or large files, ITS Advanced Research Computing (ARC) offers a number of active research data storage services (e.g., OSiRIS, Locker, and Turbo). |
High Performance computing | For researchers requiring high performance computing, ARC provides a number of computational and data storage resources, including the U-M Research Computing Package. Many schools and colleges also offer services in partnership with ITS including the College of Engineering, Medical School, and LSA. |
Research Cores | For data management needs specific to your discipline, there are a number of other services available. Many of the ~100 research cores across U-M, for example, offer data services related to the equipment and/or analyses they provide. |
Using Generative AI | The Michigan Institute for Data Science developed a guide on how Generative AI can be used in multiple aspects of your research, based on published guidelines by journals, funding agencies and professional societies, as well as other benefits and risks. |
Data Sharing and Long-Term Preservation
Research data needs change as researchers transition from actively managing a project and/or analyzing data to completing it and/or publishing it. Best practices include archiving or preservation to ensure public access, documentation of metadata to improve discoverability, and increasingly, annotation and deposition of code to ensure reproducibility. The following are examples of resources available to help researchers ensure their data is accessible over the long term.
Need | Resources |
General Guidance | General guidance for sharing and preserving data, including how to select a repository, are available as a research guide from the U-M Library. Subject specific guides are also available for health sciences, engineering, and qualitative sciences. |
Long-Term Data Storage | For larger data sets, Advanced Research Computing (ARC)’s Data Den Research Archive can be combined with other services (e.g., Globus, for which U-M has an institutional subscription) to enable long-term archiving of data that isn’t actively being accessed. |
Repositories (Digital Research Data) |
|
Repositories (Physical Specimens) |
|
Software and Code Sharing | To make computational code and/or software used to generate or analyze research data publicly available, code should be placed in a known and publicly understood repository such as GitHub, SourceForge, BitBucket or similar. These repositories should be actively maintained with updates, basic use instructions, appropriate licensing terms and an associated copyright notice. U-M Innovation Partnerships should be consulted with respect to best practices, options, approaches and guidance when releasing code open source. |
Sharing Data – Informational Bulletin
Type of Protection | Resources |
Copyrights | For information about copyright basics and Creative Commons licenses, researchers can review copyright guides or contact the U-M Library’s Copyright Services team directly. |
Data Use Agreements (DUAs) |
|
Publication Repositories |
|
Intellectual Property | Intellectual property, technology licensing, and material transfer agreements, and often data use agreements with corporate sponsors, are handled by U-M’s Innovation Partnerships. |
Publisher Data Policies |
Publisher requirements for public access of research data have been evolving rapidly over the past several years. For example, many journals have chosen to adopt some or all of the Transparency and Openness Promotion (TOP) Guidelines, which require modular data citation and availability standards. Below are links to some large publisher policies and other resources; however, given the fast changing landscape, we strongly recommend you confirm your individual journal’s policy before submitting your publication–even if you have published in that journal recently. SpringerNature; Wiley; PLOS; Elsevier; Taylor & Francis; SAGE; ICMJE |
Open Access Publishing |
Many authors choose to publish journal articles or books that are available to any reader at no cost (i.e. “open access” publications). In these cases, the publishing costs are often the responsibility of the authors themselves. The U-M Library has negotiated deals with many scholarly publishers to provide discounts for authors on article processing charges and also offers up to $15,000 for open access monographs in the humanities. |
Have Questions?
Want Updates?
The RDSI Community of Practice is an open MCommunity group available to all U-M staff and faculty with questions on their work as it relates to research data management.