Best Practices for the Stewardship of Research Data
Across all stages of the research life cycle and all fields of study, researchers should consider the potential long-term impacts on the eventual storage and preservation of research data. Below are some resources by stage of the research life cycle that can serve as an entry point into data stewardship practices that will help researchers save time, address funder requirements, and ultimately maximize the impact of their research.
- Study Design and Data Collection
- Data Security & Privacy Protections
- Research Data Management and Storage
- Publishing, Licensing & Data Use Agreements
Setting up a study with data stewardship in mind–such as clear protocols for the collection and storage of data generated–will have tremendous downstream benefits. In addition, most agencies now require data management plans (DMPs) or other information about data management and stewardship as part of the proposal submission process.
Below are a selection of resources that can help researchers get started as they think through effective data practices while developing their study.
Study or Proposal Element | Resources |
Common Data Elements (CDEs) | CDEs are structured human and machine-readable definitions of data elements for use in research and for other purposes. NIH has a Common Data Elements Repository to help researchers identify standardized terms or concepts used across studies ranging from surveys to disease nomenclature. |
Metadata |
|
Protocols |
|
Data Management Plans |
Most agencies now require data management plans (DMPs) or other information about data management and stewardship as part of the proposal submission process.
|
Proposal Budgeting | Along with new requirements for data management, funding agencies are increasingly allowing data sharing costs to be included as direct costs in proposal budgets. ORSP provides high-level budget and cost guidance for proposals as it relates to direct costs associated with a project. Absent any prohibition from the funding agency, and in accordance with the applicable terms and conditions of the underlying grant, costs associated with data curation, data formatting, data de-identification, preparation of metadata, and repository data deposition fees may be planned for and included in the proposal as direct costs. |
Discipline-Specific Guidance | Resources |
Clinical Research |
U-M researchers can get assistance with the design, conduct, and analysis of clinical trials, including data management and software development through the Statistical Analysis of Biomedical and Educational Research Group (SABER). Additional data collection guidance and an online course on the fundamentals of data management related to clinical research is available from the Michigan Institute for Clinical & Health Research (MICHR). |
Qualitative Research | Qualitative research is non-numerical data and often requires contextual information that pose additional data management challenges. The Data Curation Network provides a primer on data types in qualitative research to help researchers navigate data needs in these fields. |
Computational Research | Computational research increasingly must grapple with making code and software available, in addition to research data. Several sources provide guides for researchers on how to navigate these challenges, including NIH, Software Carpentry, and the Software Sustainability Institute. However, please and if necessary, consult with U-M Innovation Partnerships for questions about licensing options, best practices, and guidance about IP when releasing code open source. |
Humanities | Data needs are increasing across the humanities with the rise of digitization. The Digital Humanities Curation Guide provides a compilation of resources to help digital humanities scholars with data curation challenges. |
Diversity Scholarship | An open data toolkit for diversity scholars to guide best practices in collecting, managing, utilizing, sharing and curating research data for the public good is available from the U-M Library. |
Researchers have a number of factors to consider when managing research data–especially when dealing with potentially sensitive information or certain types of regulated data. U-M has a number of resources available to help researchers navigate these challenges depending on which types of data is being generated.
Overall Safety/Security Guidance | Resources |
International Collaborations and Export Controls | Some research data may have restrictions on if/how they can be shared with foreign countries, persons, or entities. U-M Export Controls can help researchers ensure compliance with all appropriate regulations and create technology control plans (TCPs), if necessary. |
Research Data Security | Several types of research data require specific protections based on various university and legal requirements. U-M’s Research Information Security Oversight (RISO) Program works with PIs to determine which, if any, additional controls are required. |
Safe Computing | To protect yourself and your research data from phishing attacks or other electronic vulnerabilities, U-M provides high-level safe computing resources, including a sensitive data guide. |
Research Related to Human Subjects | Resources |
General Guidance | To help researchers maintain human subject data securely with the appropriate level of anonymity, confidentiality, or de-identification, refer to human subject data security guidance (including a checklist). |
Compliance Reviews | Researchers can receive objective analysis and evaluation of research compliance, including data security and confidentiality for human subjects studies, from the Office of Research Compliance Review. |
Data Transfer Agreements (Michigan Medicine) | When working with protected health data from Michigan Medicine, data transfer agreements associated with individual-level patient/participant data or biospecimens are reviewed by the Medical School Data Release Committee. |
Diversity, Equity and Inclusion | An introduction to the intersections between DEI and research data use is available in a 2020 report on Principles for Advancing Equitable Data Practice. |
Student Data | For educational research sponsored by the U.S. Department of Education, U-M Research Ethics and Compliance provide additional guidance around the Family Educational Rights and Privacy Act (FERPA) and other regulations. |
Short-Term Research Data Management and Storage
Data Sharing and Long-Term Preservation
Short-Term Research Data Management and Storage
Responsibly and strategically managing research data streams during a study can go a long way towards improving the long-term impact and replicability of your research. There are many resources available to U-M researchers to assist in various aspects of research data management and/or analysis across a number of disciplines or approaches. Some university-wide examples are provided below.
Need | Resources |
Consulting Services |
|
Data Storage Services (General) |
|
Data Storage Services (Large Needs) | For large amounts of data and/or large files, ITS Advanced Research Computing (ARC) offers a number of active research data storage services (e.g., OSiRIS, Locker, and Turbo). |
High Performance computing | For researchers requiring high performance computing, ARC provides a number of computational and data storage resources, including the U-M Research Computing Package. Many schools and colleges also offer services in partnership with ITS including the College of Engineering, Medical School, and LSA. |
Research Cores | For data management needs specific to your discipline, there are a number of other services available. Many of the ~100 research cores across U-M, for example, offer data services related to the equipment and/or analyses they provide. |
Data Sharing and Long-Term Preservation
Research data needs change as researchers transition from actively managing a project and/or analyzing data to completing it and/or publishing it. Best practices include archiving or preservation to ensure public access, documentation of metadata to improve discoverability, and increasingly, annotation and deposition of code to ensure reproducibility. The following are examples of resources available to help researchers ensure their data is accessible over the long term.
Need | Resources |
General Guidance | General guidance for sharing and preserving data, including how to select a repository, are available as a research guide from the U-M Library. Subject specific guides are also available for health sciences, engineering, and qualitative sciences. |
Long-Term Data Storage | For larger data sets, Advanced Research Computing (ARC)’s Data Den Research Archive can be combined with other services (e.g., Globus, for which U-M has an institutional subscription) to enable long-term archiving of data that isn’t actively being accessed. |
Repositories (Digital Research Data) |
|
Repositories (Physical Specimens) |
|
Software and Code Sharing | To make computational code and/or software used to generate or analyze research data publicly available, code should be placed in a known and publicly understood repository such as GitHub, SourceForge, BitBucket or similar. These repositories should be actively maintained with updates, basic use instructions, appropriate licensing terms and an associated copyright notice. U-M Innovation Partnerships should be consulted with respect to best practices, options, approaches and guidance when releasing code open source. |
Type of Protection | Resources |
Copyrights | For information about copyright basics and Creative Commons licenses, researchers can review copyright guides or contact the U-M Library’s Copyright Services team directly. |
Data Use Agreements (DUAs) |
|
Publication Repositories |
|
Intellectual Property | Intellectual property, technology licensing, and material transfer agreements, and often data use agreements with corporate sponsors, are handled by U-M’s Innovation Partnerships. |
Publisher Data Policies |
Publisher requirements for public access of research data have been evolving rapidly over the past several years. For example, many journals have chosen to adopt some or all of the Transparency and Openness Promotion (TOP) Guidelines, which require modular data citation and availability standards. Below are links to some large publisher policies and other resources; however, given the fast changing landscape, we strongly recommend you confirm your individual journal’s policy before submitting your publication–even if you have published in that journal recently. SpringerNature; Wiley; PLOS; Elsevier; Taylor & Francis; SAGE; ICMJE |
Open Access Publishing |
Many authors choose to publish journal articles or books that are available to any reader at no cost (i.e. “open access” publications). In these cases, the publishing costs are often the responsibility of the authors themselves. The U-M Library has negotiated deals with many scholarly publishers to provide discounts for authors on article processing charges and also offers up to $15,000 for open access monographs in the humanities. |