4R Fund Research Repository


01 May 2015

Project Description

Project Overview
The objective of this proposal is to develop a standard data repository and preservation framework for 4R Fund projects. The research repository (RR) and framework will ensure data and metadata are standardized across projects, widely accessible, adhere to emerging open access data principles, and archived for long-term preservation and reuse. We will model the 4R Fund RR on the framework already developed for curation and preservation of data from the Purdue University Water Quality Field Station (WQFS). The WQFS includes a wide variety of data that are similar to those being collected in 4R Fund projects. The WQFS core database is housed within the Purdue University Research Repository (PURR), managed by Purdue Libraries; WQFS researchers have collaborated with PURR and Purdue Libraries to develop workflows, procedures and policies for the curation, preservation and publication of agricultural datasets that meet or exceed emerging requirements for “open access” to data as a public good. Faculty from Purdue Libraries and the Department of Agronomy will collaborate with the research teams of 4R Fund projects to 1) describe, annotate and otherwise prepare existing datasets from 4R Fund projects for open access and publication , 2) assess and, where necessary, improve the data collection workflows and annotation practices of new 4R Fund projects such that completed datasets are “publication ready,” 3) create policy and protocols for a 4R Fund RR that meet grantor and grantee needs for open access, privacy, and embargoing, and 4) create guides and self-help tools for future 4R Fund researchers to ensure compliance with 4R Fund RR policy for data standardization, interoperability, open access, and other “best practices” in data management. The collaboration will be facilitated by Paul Fixen and Scott Murrell (International Plant Nutrition Institute).

Timeline and Plan of Work:
The timeline for the project is 18 months. Plan of work elements and their approximate timelines are as follows:

Phase 1 (0 – 4 months): The project will commence with the hiring of the Dedicated (1.0 FTE) Postdoctoral Associate. Working closely with the 4R Fund RR Development Team, the Post Doc will initiate contact with all 4R Fund projects, conduct a series of phone interviews as a preliminary needs assessment following the general template of the Data Curation Profiles developed by Purdue Libraries. This assessment will collect examples of all types of data being collected by all 4R Fund projects and compare them to existing standards identified in Phase 2 (below). A project-specific agenda for visiting with each research team will be developed; project visits are anticipated to require 2-days on site for an introduction to repository concepts and general best practices for data and a comprehensive assessment of each project’s current data status and workflow needs. During the project visits, Data Curation Profiles will be completed for as many project researchers as possible, including lead researchers, technicians and graduate students.

Phase 2 (1 – 14 months): In Phase 2 4R Fund data will be ingested into PURR, standardized and described and, where needed, templates will be developed and dispersed to existing projects to facilitate any ongoing collection of data within a project. Specific key activities are:
    · Develop the 4R Fund RR workspace within PURR and integrate 4R Fund projects into the PURR secure workspace.
    · Complete a comprehensive environmental scan of relevant, related data standardization and repository development activities to coordinate the 4R Fund RR with similar pilot activities currently being developed or ongoing within the National Agricultural Library, USDA AFRI-NIFA, etc.
    · Characterize deficits in data and meta-data standardization of ingested data, comprehensively annotate/standardize data, and develop and disseminate to ongoing projects templates and best practice workflows to overcome the deficits and improve efficiency of future data ingestion into PURR.
    · Customize a pre-existing Data Management Planning (DMP) Tool to the specific needs of 4R Fund researchers and prepare associated guides for use of the tool with new 4R Fund projects.
    · Develop a business, policy and governance model to ensure sustainability of the 4R Fund RR including anticipated cost estimates for one or more viable options.
Note, for ingesting data we will prioritize the meta-analysis projects and any project funded in the first round of 4R Fund proposals that is close to completion. However, we will attempt to get preliminary guidance to all funded projects as soon as possible so existing / on-going projects can address any critical changes to their data management as quickly and efficiently as possible. Also, while we anticipate that continued housing of the 4R Fund RR within PURR may be a long-term option, Purdue Libraries is only just beginning to develop their policies for such agreements. Further, there will likely be other options available to the 4R Fund for housing and sustaining their RR and we will attempt to identify and characterize alternative options.

Phase 3 (12 – 18 months): The final project phase will be dedicated to ensuring the 4R Fund RR long-term sustainability and utility to future projects. Lessons learned from Phases 1 and 2 will be incorporated into a data management guide for the 4R Fund. The following project outputs will be completed in Phase 3:
    · An on-line DMP tool tailored to the anticipated needs of the 4R Fund and IPNI-facilitated plant nutrition – soil fertility projects;
    · A best practice toolkit including written guides for collection of agronomic data inclusive of candidate data and meta-data standards;
    · A “portable” 4R Fund RR that may remain at PURR or could be relocated from PURR to another repository entity should that be deemed most beneficial to the 4R Fund Management and / or Purdue University, and
    · A technical report on business, policy and governance models for research repositories including specific analysis of options and associated / anticipated costs for the 4R Fund RR.

Note, any standards developed through this project are likely to be considered “candidate” because this project is too short to achieve broad acceptance from the research community. However, whatever candidate standards are developed or identified during this project have exceptional potential to become the de facto standards with continued use, dissemination and promotion. Portability of the 4R Fund RR is important as distributed RRs and their business models are rapidly evolving and the final disposition of the 4R Fund RR should be flexible to take advantage of the most mature options available at the project’s end.

4R Fund RR Development Team ~ Roles and Responsibilities:
Purdue University Department of Agronomy: Sylvie Brouder, Jeff Volenec and Ron Turco from the Department of Agronomy have expertise in plant nutrition/soil fertility, plant physiology, water quality and soil microbiology, physical properties and chemistry and will provide guidance from the Agronomy perspective on types of data and cultures of practice to assist PURR and Purdue Libraries in understanding important attributes of agronomic data critical to the standardization and comprehensive description of 4R Fund RR data.

International Plant Nutrition Institute: Scott Murrell will serve as a technical advisor to the project to facilitate communication in the development of 4R Fund RR best practices for data management. Paul Fixen will be the primary liaison between the 4R Fund RR development team and the 4R Fund RR Committee

Purdue Libraries: Marianne Bracke will work closely with the Dedicated Postdoctoral Associate throughout all phases of the project to characterize the workflows necessary to ingest 4R Fund data and will interface with PURR personnel in the design of the 4R Fund space within PURR. Amy Barton will advise on meta-data standards. A graduate student (0.25 Research Assistantship for 12 months only) will work on the development of business, policy and governance models for novel public-private partnerships such as the 4R Fund RR within PURR. This work will dovetail with other, on-going RR pilot projects (e.g. with NAL) and will be overseen by Paul Bracke, Associate Dean of Research for Purdue Libraries.

Dedicated Postdoctoral Associate: A full-time post-doctoral associate will have the domain knowledge necessary to understand the data and culture of the researchers leading 4R Fund projects and will be mentored by Marianne Bracke in the additional informatics skills required to undertake this project. This individual will be the main point of contact with the 4R Fund projects and will work with the researchers to assist them in data preparation. The postdoctoral associate will be responsible for characterizing major trends in the data cultures of the researchers to inform the development of the DMP tool and the best practices toolkit.