NEON Program Enters Collaboration with Environmental Data Initiative
August 4, 2021
The NEON program has joined forces with the Environmental Data Initiative (EDI) to promote data accessibility and usability in the environmental sciences. The joint initiative will create tools, templates, and standards that will make it easier to synthesize data from the NEON program, the Long Term Ecological Research Network (LTER), and other networks and organizations. It also provides a place for individual researchers to publish derived data and to search and discover data sets submitted by other researchers. One of the primary goals of the joint initiative is to facilitate the creation of an inclusive and diverse community of environmental data providers and users.
The Environmental Data Initiative and FAIR Data
EDI was founded in 2016 as a collaborative initiative between the University of New Mexico and the University of Wisconsin. The project, which is funded by the National Science Foundation (NSF), grew out of over 30 years of data management experience in the LTER community. Now, they support and enable data curation and reuse for a broad community of ecological data providers and users, including but not limited to scientists funded by several NSF programs.
EDI brings together data scientists, environmental researchers, and software developers to develop standards and software tools to enable FAIR data, as outlined by the GO FAIR initiative. GO FAIR seeks to make scientific data more open and inclusive for individuals and organizations. The FAIR principles are:
- Findable: Data and metadata should be easy for both humans and computers to find to enable discovery.
- Accessible: Data and metadata should be easy to access using free and open communication protocols.
- Interoperable: Data and metadata use standardized formats and language to enable data from different researchers and databases to be integrated and analyzed together.
- Reusable: Data should be clearly described in the metadata (including provenance and other relevant attributes) and meet domain-relevant community standards to enable replication and reuse.
EDI operates a secure data repository for data and metadata derived from publicly funded research. Participating organizations and individual researchers add ecological data and metadata to the repository, where it can be discovered and used by other researchers. They work closely with the LTER, NEON, DataONE and other organizations to promote data interoperability between networks and provide training and support for individual data contributors and users. Data in the EDI repository may be discovered and accessed through the EDI, DataONE or Google dataset search portals.
EDI and the NEON Program
The collaboration between EDI and the NEON program is a natural fit. Both organizations are focused on making ecological data freely available and accessible to the science community.
EDI and the NEON program have defined four key goals for the joint project.
- Build a diverse and inclusive community of data providers and users that manages data using best practices and standards in the environmental sciences.
- Develop recommended best practices and standards for data users and providers.
- Work to improve discoverability and interoperability of environmental data that can foster understanding across spatial and temporal scales.
In other words, we are working together to make data and data tools more accessible and usable for a broader range of people and a wider variety of projects. Ultimately, the initiative will make it easier for researchers to:
- Publish data derived from NEON's data products in the EDI data repository, where they will be findable and accessible for other users.
- Pull datasets from multiple databases (such as NEON and LTER) in a way that allows them to be integrated, searched, compared, and analyzed simultaneously.
- Democratize data through open access and transparency and reduce time-to-knowledge.
One example of derived data already available on the EDI data portal is "Dissolved Greenhouse Gas Concentrations," derived from the NEON "Dissolved Gases in Surface Water" data product. The data set combines carbon dioxide, methane, and nitrous oxide concentrations from measurements taken at NEON aquatic field sites.
There are many other possible derived data sets that researchers may want to create from the NEON data products. For example, the NEON program does not provide "productivity" as a data product, but all the pieces needed to calculate productivity are in the NEON data portal. Researchers can create secondary derived data products from the NEON data products using a variety of software tools. Making these derived data sets available for other researchers expands the use cases for NEON data. The EDI data portal provides a centralized and standardized way for researchers to share these derived data sets.
Creating Tools and Template for the NEON Data User Community
The NEON program and EDI are working together to create standards and tools to make it easier to synthesize NEON data with other EDI datasets. A working group with representatives from both organizations has developed guidelines for publishing datasets derived from NEON, to help make the data more discoverable, accessible, reproducible, and reusable. The group is in the process of creating a NEON-EDI data management plan template as a resource to help with proposal writing and to facilitate consistency in how NEON data users manage and publish derived data.
The partnership has also shaped the development of a biodiversity data standard, the ecological community data design pattern (abbreviated as "ecocomDP"), and an R package of the same name that provides a library of tools for creating and working with biodiversity data in the ecocomDP format. The ecocomDP library for R allows users to discover and synthesize biodiversity data from the NEON data portal with similar data from other networks, such as the US LTER program. It serves data to users from both repositories in a standard intermediate data format (the "ecocomDP" format) so they can be searched, sorted, and analyzed together. The development of the data design pattern was led by Margaret O'Brien at EDI and the development of the R library was led by Colin Smith (EDI) in collaboration with Eric Sokol at the NEON program and members of the Biodiversity and Ecosystem Stability Working Groups that were established at the 2019 NEON Science Summit. The NEON program and EDI plan to develop similar tools for other data products in the future.
Promoting FAIR Data for the Ecology Community
The NEON-EDI initiative is one of many ways both organizations are seeking to broaden the user community for ecological data.
Dr. Corinna Gries, a principal investigator of EDI based at the University of Wisconsin, says "We maintain EDI as a repository of high value data that are documented in sufficient detail to be reusable. Furthermore, we are committed to advancing data publishing in the ecological research community through curatorial support, data management training, extensive outreach activities, a data fellowship program, and supporting a vibrant community of professional data managers."
Dr. Eric Sokol, a NEON staff scientist specializing in quantitative ecology, says, "The FAIR data initiative is important to increase inclusivity in science. Having large data sets available and more equally accessible to everyone provides more opportunities for educators, early-career researchers, and people from institutions that don't have the resources to collect and maintain long-term data sets of their own. By working together with EDI, we can give opportunities to a lot of different people in different positions to do lots of cool science."
NEON Chief Scientist and Observatory Director Dr. Paula Mabee says, "Our partnership with EDI enables us to improve the discoverability and interoperability of environmental data that can foster the frontier science that NEON seeks to enable. We are together seeking to build a community of data users and providers that manage data using best practices and standards in the environmental sciences."